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Tov^ard a Criterion Theory: 

A Revievi and Analysis of Research and Ooinion 

Viilliacn W. Ronan 
Geor.'sia Institute of Technology 
and iirich ?. Prien 
University of Akron 

A literature review dealing with the development and 
utilization of work p6>rformance criteria has revealed some 
basic queistions concerning criteria. They are: (1) Is job 

perfomaance reliable? (2) Is observation of job performance 
reliable? (3) Is job performance unidimensional? (4) Is job 
performance modified by eytra"'indxV3-dual condxtxons? 

Generally a paucity of research information exists in all 
the areas enumerated above. For example, fewer than 25 studies 
have investigated directly the important concept of performance 
reliability. 

It is suggested that enough information is available to 
formulate theorems and corollaries and to derive testable hypo- 
theses, In the concluding section 15 areas of required research 
are suggested as fruitful for providing needed answers to the 

questions posed. 

The ’'criterion problem” pervades all areas of psychology, 

Xn its most basic fonn, a criterion is an assumed perfect 
and tame measure of variability, whether that variability is of 
human behavior or some aspect of f^roup or organizational 
functioning. For the most part- psychologists have been con- 
cerned V7ith variation which is more or less directly related 
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to individual differences within specific situations or with 
reference to the particular pattern of experimentally con* 
trolled variables# However^ the ligitimate scope of criterion 
investigation includes development of concepts of personality 
characteristics., characteristics of group and organizational 
functioning. Invest i.<?-at5-on is also justified of the more 
practica.1 problems such as the definition of human emotiona.l 
adjustment, dimensions of executive performance, dimensions 
of employee job withdrawal behavior, or the definition of 
sales performance. 

The concern of psychologists and others in research and 
practice has been with the more practical matters of development 
and measurement within specific situations. In criterion 
research., unlike learning theory or personality theory- very 
little has been done in the area of individual- situation 
interaction which would qualify as basic or pure research 
aimed at the deve.lopment of a theoretical structure. Cer- 
tainly under the broad scope of the definition, personality 
theory and theories of social interaction come c,lose to satis- 
fying this void. However, it is seldom that any effort is 
made to bridge the gap between the study of the individual or 
group in artificial situa,tions, and variabi.lity of beha.vior 
and performance in the world of reality. 

i'*uch of the empirical work in the various areas of per- 
sonnel psychology has been a matter of expedience, motivated 
by the need for so.lution to a specific problem rather than 
by the desire to generate a theoretical framework. 
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Historically, the €^mphasis has been on the selection of 
the "most noticeable" rather than on the development of the 
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most appropriate criterion. The tendency has been to accept 
wiiat existed rather than to determine both the "necessary" 
and "sufficient" standards, Otis (1953) succinctly identifies 
the researcher and the practitioner as the culprits in this 
respect. 

Considerable empirical data have been amassed > but there 
have been few attempts to assess these data in total, A 
complete survey of the literature in all areas of psychology 
is, of course, prohibited. Admittedly, the need for criterion 
research is as present and pressing in other areas of psych- 
ology as it is in personnel and industrial psychology. To 
the extent that other areas of psychology overlap with per- 
sonnel and industrial psychology some reference will be made 
to existing empirical data in those areas, Lovrever this 
review is primarily concerned with the problems of variability 
of perfonviance behavior in work situations. The emphasis 
is on more objective performance ni€'.asures with material on 
merit rating included only to clarify specific points,, 

By our definition, this review is concerned with behaviors 
which are limited by operations within specific situations; 
operational definitions of behavior variability of individual- 
situation interactions or group- situation interactions. Ulti- 
mately the combinations of individual /situational factors 
should lead to definitions of variability within complete 
organizations. The ultimate practical solution is the iden- 
tification of the antecedent conditions, both the individual 
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differences and the situational characteristics., which limit, 
enhance or 5-nb.ibit behavior variability. Ultimately we must 
understand performance within this context of individual, 
situational and organizational variables acting separately and 
interacting to affect performance behavior. Our criterion 
definition thus is measurements of the manifestations of per- 
forr:ance behavior based upon characteristics of individuals 
as they affect and are affected by situational and organiza- 
tional characteristics. 

Industrial psychology has for 'many years studied a few 
of the possible methods for measuring criteria of job per- 
formance. The result has been the rather x-Try cliche, "the 
criterion problem. " A recent statement of this problem was 
by Dudek (1963) in the Annu al - Eevlew__Q j£JP^^ . i. e. , 

"Criterion problems, as usual,, received a great deal of at- 
tention — and some action." An earlier statement by Viteles (1926) 
was. "...it requires only a brief survey of the literature 
to show that in spite of the recognized importance of reliable 
standards and/or recognized precautions in the selection of 
such standards ^ the criteria in individual investigations have 
on the whole been very unsatisfactory." alssentially the same 
statement is made by Wallace and v/eitz (1955) and Kaire (1959) 
in writing of major findings or problems in industrial psych- 
ology. No x*?riter. though, suggests the probability of isolation 
of the problem (if it is a problem. Dunnette 1963a) in the 
near future. In general, it appears that attention but little 
action will continue to be the role. 
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Historical Overview 

As mif^ht be inferred from Viteles* quotation, attention 
had been devoted to the development of adequate measures of 
job performance for several years. Link (1919) published one 
of the earliest studies wherein ratings of job performance 
from two supervisors were secured. Thorndike (1920). based 
upon earlier work by V/ells (1907). named the ”ha.lo effect” 
that to a large degree, accounted for the high corre.lations 
i-ink obtained, i.e. , .82 and .92, in two differ6‘.nt grouos. 

Hreyd (1923-24) in a general discussion of vocational selection 
problems discussed the need for job ana.lysis. the importance 
of individual differences, the concept of recognizing that 
different jobs require different abilities and that measure- 
ment in these areas was possible. Twelve possible criteria 
were named and discussed. Investigations using more objective 
criteria than ratings bad begun giarlier. Yerkes (1921) pre- 
sented what appears to be the. earliest study using more objective 
criteria. The criteria were output and accuracy of graphotyoe. 
operators V'^itb a correlation of .11 between the two. Lovett (1923) 
published a study on selection of salesmen that was very sophis- 
ticated for the. time, and can still be regarded as the esrceptional 
design. Similarly ivorn.hau.ser (1923“24) presented a selection 
study of billing machine operators using eight tests and years 
of schooling to predict six criteria. This study too had 
estimates of reliability and intercorre-lations of selected 
criteria. Pond (1925-26) presented another of the earlier 
studies that, along with the se.lection basis, made a systematic 
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study of the reliability and interrelationships of criteria 
of job performance. In addition to reliability indices of 
four criteria. Pond intercorrelated foremen's ratings with 
highest weekly pay. The intercorrelations were of a nature 
that has become quite well established since this pioneering 
study, i.e,„ a range from the -.30* s to .50*s with a median 
in the 20* s» Her solution was one that has also become all 
too comriion — “These sources of unreliability in the factory 
criteria of success were themselves unmeasured, and difficult 
to evaluate in any way. There v^as always the Possibility that 
in spite of them, significant relationships might be found 
b€*.tween the criteria of success and test scores.’* Concurrent.ly 
v:ith rond*s wor.k. Shellow (1925-26a) was facing the same problems 
in studying the se.lection of street car motormen. She discussed 
alternative criteria and in view of disappointing reliabi- 
lities (intercorre-lation .05 between ratings by the “Chief 
Instructor” and “member of Sducational Department”) finally 
decided upon turnover as a criterion. Another early study by 
Frey (1925-26) discovered a unique source of criterion bias. 

“The sales record itself was found to be anerratic measure 
of sales ability because some of the men ran up high records 
by selling only to re,latives . whereas others of considerable 
past experience or apparent aptitude lacked temporarily a 
clientele. The sales managers were ablet to detect the cases 
v^here the sales record was not a valid criterion and make 
the necessary adjustments.” The ’’rebate evil” in insurance 
sales had been acknowledged for some time prior, and a solution 
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had been first proposed by Peters (1894) working with S, A. ';ood 
and the Geor‘?ia Life Insurance Gomoany (reported by Gilmer. 

1961). 

This search for more objective crite^ria of job performance 
had been the result of disapnointing studies usins; rating scales 
as criteria. In fact in the same issue of the 

sonnel Res ear ch (1925-26) in which the cited studies appeared . an 
article by Kingsbury was opposing the abandonment of rating scales 
as criteria. Kingsbury’s article suggested that clarifying the 
concepts of raters, rater training programs, further improvements 
of scales and consideration of the practicability of rating 
scales V70uld solve the problems connected with their use, 

i^ll (1928) devoted an entire chanter (12) to a discussion 
of the importance and some concepts of criteria of job performance. 
With regard to the former he says, *'...to proceed on a scien- 
tific aptitude project without an adequate*, criterion is hopeless...” 
and. goes on to present a categorization of criteria as product 
action subjective impression. This attempt to conceptualize 
and systematize job performance measurement was in contrast to 
naming possible criteria that had been the practice. However 
even this work tacitly supports the usage of a single job per- 
formance measure, ratings, as adequate for criterion purposes. 
Shortly after ^-ull’s book appeared. Bird (.1931) published what 
is probab.ly the earliest study combining more than two cri- 
terion measures and called it an ''efficiency inder.” This 
index consisted of salary, number of months employed salary 
increase, number of promotions and ratings by superiors. Today, 
the hazards of such a composite are obvious 



but for the time 
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it represented a de^artuxe from the use of a single index of 
job performance. 

The*, combination of single act or behavior incidents (to re- 
ceive much attention later) and estimation of an individual 
average or sunmarized impressions ignores the scale unit and 
dimensionality considerations. Ear 13 ^ research capitalized 
on the occurrence of incidents or single acts thus avoiding 
problems inherent in measurement as well as the abstract problems 
of definitions. This particular problem, an artificial two 
category s37st€im for classifying., remains today. 

Bxstorically the emphasis was placed on easily id€in- 
tified, specific behaviors or global measures accepted as the 
composite measures of goodness. With only minor exceptions . 
the practice continues today in the attempts to predict turn- 
over. .Lost-time accidents patent disc.losures plus innumerable 
other points on the continuum. Little or no effort, then or 
now, is devoted to the identification of the basic dimensions. 

Looking back on the period, it seems most peculiar that 
psychologists did not face the problem of multi-dimensional 
criteria sooner because it was apparent that others had. Various 
mathematical models were appearing a short time later that must 
have been in the germination stages during the period discussed. 
For example, in 1936 Edgerton and, kolbe. Borst and hotelling 
all published studies deialing with combining various criterion 
measures into a single measure of performance. Travers (1939) 
described the*, discriminant function and Wherry (1940), an 
adaptation of the Edgerton-Ko.lbe method. All of these studies 



9 



had in common the concept that prediction of performance would 
require a battery of predictors and description of performance 
would require a battery of measures. 

It was during this period that Viteles (1936) introduced 
a criterion dimension that had received virtually no attention 
up to the time and has received comparatively little since. It 
was the satisfaction an individual receives from his work in 
contrast to the strictly "economic efficiency" aspects of 
job performance. The issue this raised has continued ever 
since and onl}^ recently has received some consideration as a 
criterion raeasure. iilven the recent conceptualizations by 
herzberg, Kausner-, and Synderman (1959) and Brayfield and 
Crockett (1955) fail to agree as to the relation of attitudes 
and satisfaction of the individual worker to any operationally 
defined goals or objectives. To ciilminate this period. Bellows 
(1941) published a study that attempted to systematize the 
de‘.ve.lopment of job peirformance criteria and Horst (1941) edited 
what can be regarded as a classic in the field. This latter 
study, with man}/ eminent contributors and consultants, was a 
compendium of the problems and techniques of orediction. ■. Writ- 
ten with an ey€‘. toward the coming of World W'ar II and its 
serious manpower problems the study discussed the major problems 
of prediction of performance and presented the methods for 
solution as th€iy were known. The study in fact delineates the 
basic problems of criteria deve.lopment and performance pre- 
diction. man}?’ of which are still problems. Emphasized are the 
complexity of human activities the difficulty of defining 
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success and that conditions extraneous to the individual 
can alter bis perforraance. Consideration of these broad areas, 
with their associated sub-areas implied that extremely complex, 
criteria would be necessary to measurei virtually any activity 
with the. needed degree of adequacy# Bellows (1941) op# cit# . 
also delineated some standards by which criteria were to be 



evaluated, the more important of which vjere reliability, cor- 
relation with other criteria and predictors and acceptability 
to the job analyst, hagle (1953) describes the derivation 
of a composite which was rejected by Guion (1961) as a practical 
consideration# 

World war II brought with it unprecedented opportiinities 
in the geiiera.l areas of personnel research# I'^iucli of this work, 
is summarized by Stuit (1947), Flanagan (1948), and Stouffer 
et. al. (1949). Criterion development received considerable 
attention during the course of this war but, under compulsion 
of imrA€‘.diate necessity., single criteria were commonly used# i^or 
example, the pi.lot and navigator criteria were "check ride" 
ratings and, for bombardiers, "circular error." These measures 
had general reliabilities of about .50, #02., and .18, In the 
case of pilots, it is to be noted that the limit of predictive 
efficiency had. about been reached as shovm by Flanagan’s (1946) 
classic study. In this' experiment 1143 persons were sent 
through pilot trainin?^.; regardless of selection test scores# 

The multiple correlation for this group with the pass-fail 
pilot criterion was #66 which, with a criterion reliability 
of about #50,. is very near the maximum possible correlation# 
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It is regrettable that more attention vTas not given to criterion 
development at the time, particularly in view of the fact that 
some of the more important concepts in need of evaluation had 
been described by Toops (1944)* The article indicated the 
need for “success profiles** as criteria primarily because success 



in an activity is not unitary and. further, persons can be suc- 
cessful performers in a given activity for different reasons 
and at different times. Otis had earlier described this same 
problem in a book edited by Stead. Shartle., et al. (1949). 

The detailed resolution was not presented until much later by 
Toops (1959). however., military studies generally continued to 
use a single performance measure as a criterion. 

The World War II experience did result in a clearer con- 
ception of and some work in the general area of criterion 
development. Stuit and Wilson (1946) published a study showing 
the marked "influence of the criterion upon the relationship 
between predictive indices and measure of success •“ The general 
point of the study., that continuing, attention to better performance 
measures results in better predictions of performance., is amply 
demonstrated by the results. In a series of studies, ulanagan 
(1949, 1954., 1956) had described the conception and refinement 
of the “critical incident technique" as a method of criterion 
development as contrasted to criterion selection. In the 
history of personnel research, this was the first presentation 
of a systematic method specifically aimed at isolating the bases 
of performance and. from these, working back toward selection 
methods. In addition to the critical incident technique. 
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wartime experience did bring a much clearer recognition and 
formulation of the nature and characteristics of performance 
criteria, Thorndike (1949) presented a comprehensive discussion 
of perforraance raeasures. He discussed criteria as immediate 
intermediate and ultimate, criterion relevance, various types 
of criteria with their limitations and considerations for 
evaluating criteria. The study covered most oz the facets 
of criterion development that were and are of importance. 

Van Dusen (1947) and Jenkins (1946), in a more limited way, 
covered some of the same material based, upon military experience. 
These studies in criterion development culminated with Kagle 
(1953) op, cit, . Wherry (1957), and Weitz (1961), The former 
brings out aga5.n the point that individual job satisfaction has 
had virtually no study as a possible performance criterion and 
recognizes hov7 introduction of this variable into criterion 
measures VTould further complicate predictive studies, Wherry *s 
study stresses the lack of systematic attack on criterion dev- 
elopment ahd he says, ”If we are measuring the wrong thing, it 
will not help to measure it better,” making the general point 
of past emphasis on predictors rather than what is to be pre- 
dicted, V/eitz (1961) op, cit,. presented experimental evidence 
to show how selection of different criteria (in learning word 
associations) materially changes the interpretation of results 
and, it is pointed out. that the ”laws of criteria” remain to 
be discovered, Adkins (1947) during this same period discussed, 
some of the assumptions that are made about criteria in pre- 
dictive studies. One important point was that unless provision 
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is made for control, motivation., risk, experience personal 
history items, work environment and other such possible var- 
iables are assumed to be equal. On this point, the social 
scientist needs to refer to Campbell’s (1957) discussion of 
experimental desi^^n relevant to variables which affect the 
outcomes of research. To take one of the variables, motivation. 
Eysenck (1963) published an experimental study showing that 
unequal motivation can be extremely important performance 
variable, and further, it has a nonlinear relationship with 
performance. It is rare to see a study V7here the variables 
named by Adkins arc controlled, although they almost certainly 
have some effect on predictor-criterion relationships. 

i’hre recent. ly two other methods, by Lawshe and Steinberg 
(1955) and Primoff (1957). have approached the evaluation of 
job perfornance by first having competent observers rate el€'.- 
ments of a particular job for importance or '’critica.lness, ” 
Appropriate predictor.^ are then selected and their relation 
to the eleraents determined. After first determinations, re- 
finements are continued to approach the highest possible validity 
coefficient. This is in contrast to the previously mentioned 
•‘critical incident technique** vrhere the. approach is to have 
competent observers report behavioral incidents and. from 
these, critical requirements are constru.cted which are to be 
predicted. 

With all this x* 7 ork. has prediction of job performance be- 
come any more efficient than it was in the earlier studies 
cited? A S6*.'ies of studies by G.hiselli ana Brox^n (1951). 



14 



Gbise.lli and ^artliol (U'53) GhisaUi (1::'55) and SaLna at. 
e.l (1959) indicates that prediction vvile ttiucb. more -«oon- 
istocatec has ehown little noticeable imorovenient . 
a survey of studies rej^ardinr^ trainability shovred that various 
aotitude tests tended to be predictive of all occuoations at 
the same level ^•■'ith ints'.r correlations estimated at ,55, 

nd a survey of the oredictive utility of persona Li uy 
inventories s ''O’^/'^ecl a rsnss of avera‘"e correlation.s of • L^ to 
.36 for ei'-'-t different categories of occuoations. The latter 
t ^^70 articles provide some '■•■eneral discussion of the problems, 
that have been c-ncountered in criterion development for years, 
ouch problems as the shortcomings of the various prooosed 
mathematical models lach of functional jab descriptions the 
search for a composite criterion the dynamic nature of jobs 
the relation of prior e^^’oerience to the curre'.nt job and the 
existence and importance of both individual and situational 
moderator variables and hoT'f jobs differ in different establish- 
ments are the more important mentioned. ToFever here and 
elsewhere there has been it aonears a failure to recosni^e 
or properly take into account four fundamental nroblems in the 
evaluation of performance criteria. These are: 

(1) Is job performance reliable? The assumption of reli- 
ability is iimplicit in all predictive studies and must be true 

if adequate predictions are to be made. 

(2) Is obcervation of job performance reliaol'-. i dx..;.cc*. 
all evaluations of performance ultimately rest umon ob'^erva- 
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tion of one sort oi* anotiierj the c^uestioii of reliabllioy of 
such observation becomes crucial to prediction, 

(5) Is job performance uni-dimensional? Many studies use 
a single measurement of job performance (usually a continuum) 
to evaluate the predicted performance 5 it is critical to know 
whether or -ot such practice can be defended. 

(4) Is job performance variability an individual pheno- 
menon? Almost universally individual abilities, traits and 
characteristics are measured and these are related to some 
measure of job performance; if there are contingency sources 
of variance in job performance, they must be measured or co .- 
trolled for meaningful prediction of performance. 

Obviously the above questions have all received some con- 
sideration in various research studies. Hovrever, it is hoped 
that a selective survey of the literature will illustrate their 
overall neglect and, at the same time, their importance. In 
essence it seems a better understanding of job performance per se 
will lead to better performance measurement. 

The broader problem introduced by Otis (1940), et, al,, 
and Bellows (1941) op. cit,, and added to by ragle (1955) op, 
cit., Ouion (1961) op. cit,, and Dunnette (1963;, 1963b), and 
Weitz (1961) op. cit. is that of criteria for criteria. Cer- 
tainly practical matters of prediction are of concern, but 
ultimately some resolution of the abstract problem of defi- 
nitions and principles must be made. 
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Is Job Performance Reliable? 

Since job performance reliability is fundamental to per- 
sonnel research, it is disconcerting to find that so few 
studies have been conducted with the specific aim of deter- 
mining performance reliability. In addition, many of thece 
have been aimed at determining the reliability of limited 
aspects or single tashs of a particular job. The task is 
extremely difficult when the results are intangible or when there 
is a delay of impact of job performance. 

Individual performance variability received some early 
laboratory attention. Seashore (1931) administered eight motor 
tests to 50 subjects and, for three, five-minute cycles, 43 
hours apart, the reliabilities ranged from .75 to .94. It is 
probable that these results were inflated by learning, but they 
illustrate tne fact that individual performances vary in reli- 
ability. Anastasi (1934) selected 250 2s from an original 
group of 1000 who were below the first quartlle on four tests 
of a verbal-symbolic nature. The correlations of initial and 
final scores ranged from .30 to .61 and one of the main findings 
of the study was that individual varl'^bility increased as the 
trials continued even though individuals maintained their 
same relative positions. Hertzman (1939) matched two groups 
of 40 each for general level of ability on the Thurstone Sub- 
stitution Test but selected one group for high variability and 
the other for low over the entire tost. The two groups varied 
widely from each other with respect to within-group correla- 
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tions on subsecuent trials v'ith the correlations of the low 
variability group far more homogeneous than the high variabi- 
lity group, i*.nother interesting point was that as the trials 
continued, the intercorrelations in both groups showed a steady 
decline. Taylor i'unson. and Stone (1945) likewise show an 
orderly decrement in te-st intercorrelation as a function of 
the separation interval. In this study 12 forms of a 250-item 
number- checking test were administered at 5-minute intervcls. 

The average correlation for succeeding pairs was .925 and de- 
clined to .583 with 10 interpolated tests. Curexon (1939), 
using a longer time interval (5 days) = obtained similar results. 
Owens (1942) gave a group of 15 subjects eight repetitions of 
seven motor tests. One of the main findings of the study was 
that intra- individual differences were greater than inter- 
individual differences, Despite these laboratory indications, 
that even relatively simple task performance was not reliable, 
the application to determining job performance reliability 
has been limited; however some studies have been done on 
task and job performance. 

Craig (1924025) reported one of the earliest studies 
attempting to determine job performance reliability. With 
^retail salesv 7 omen” it was determined that a ’’value of sales’' 
criterion had a reliability of .79, hayes (1932-33) in four 
studies reports reliabilities of .78. .81, and .87 on first 
four x* 7 eeks output vs. second four weeks for various female 
shop workers and .81 for average ’’bogey” percentages first 
two vs, second tV 7 o weeks all of which are probably ?.nflated 
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due to the effect of learning;* 



Bellows (1940) reported two 



studies on operators of card ounch machines and codins clerics. 
With a criterion of errorless production the former showed 
reliabilities of .89 to .96 and the latter *87. Ayers (1942) 
used four criteria to evaluate testile inspectors. The cri- 
teria. with reliabilities by first vs. second week v?ere 
failure to discover defective units (.73) average hourly pro- 
duction (.85) incidence of units which should not have been 
put aside (.83) and total units set aside for foreman's deci- 
sion (.91). hay (1943) used the control of requiring at least 
eight months on-the-job before obtaining reliability measures 
for a groiip of bookkeepers. On three occasions he correlated 
first and third days* production with second and fourth with 



coefficients of .93 .85 and .98. The correlations between 

the three ‘'occasions*' were .83 .79- and .72. Strong (1934 -35 

1943) in studies with life insurange agents showed that year- 
l^o-year production varied, with reliabilities of .74 to .84 at 
various levels of production and another criterion avera.-e 
production of 1926-27 vs. 1929-30 was .81. hacKinney and 
Wolins (1960) on a year vs. year basis found reliabilities of 
,45. .25. .55 and .47 for- respectively,, suggestions submitted 



by foremen. si.iggestions 
foremen's subordinates. 



installed, suggestions submitted by 
and subordinates* suggest3.ons installed. 



Training research literature provides further insisht 
into the nature of performance reliability in terms of in- 
dividual dynamics. Smith and Gold (1956) examine the relation 
of ear.ly training performance to post-training performance. 
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Their results indicate a progressive increase in the correla- 
tions between various stages during training and post-training 
production. They report a range of from .46 between the third 
and fourth of a 20.5 week program with post-training production 
to about .82 between the ninth and tenth week of the 20.5 week 
program and post- training production. A similar effect is 
demonstrated by the Kornhauser (1923) op. cit. study, vianning 
and DuBois (1958) employed a unique design to eliminate the 
effect of pre-training proficiency by using the pre-training 
proficiency/post-training proficiency regression to obtain a 
measure of relative gain (residual their term) and found the 
split-half reliability of total (crude) gain = .56. relative 
(residual) gain = .=37. and final status = .77. Relative gain 
V 7 as considerably more predictable than gross (crude) gain but 
not as predictable as final status. Fleishman and Fruchter 
(1960) conc.lude that early performance in learning Morse code 
is due to specific aptitudes and later performance probably 
due to non-aptitude factors such as specific habits acquired 
during training. Bass (.1962) likewise concludes that the de- 
cline in test validity over time is due to decreased importance 
of aptitudes and increased importance of esteem and oopularity 
in sales V 7 ork. Obviously seveiral factors contribute to the 
variability of reliability. The impact of the ongoing process 
on the characteristics of the individual, and the dynamic na- 
ture of performance requirements are the two which seem most 
evident. The problem of temporal proximity well knovm in 
educational research., only magnifies the problem of intra- 
individual variability. 
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k series of studies by Rothe (1946a. b. 1947. 1951) and 
Rotiie and. liye (1958 1959) v?as specifically aimed at oeter” 

mining the reliability of job performance in several different 
occupations* In general this se.ries of studies found indivi- 
dual output to be highly erratic, .specific to tne individ.ua.1 
enormous ranges Fere found and to quote from the 1958 study., 

“In this entire series of studies of industrial output the 
most striking single result is the lack of consistency from 
time to time especially when there is no financial system in 
operation* A second important result is the wide range of ® con- 
sistency coefficients* of output data. suc.n tnat a researcher 
could be entire.ly misled by tests of statistical significance 
if he jtist happened to select a period of unusually high or 
low consistency.” 

The findings of Rothe and Mye are supported by others 
aimed at asse.ssing job re.liability* For example. Co-aevu anc 



Strauss (1946) in an extreme.ly detailed study of p6*,rformance 
in a relativeily sim.ple task, show that different persons ^an^yah. 
do a given task in the same way. They also found a 1/3 ratio 
of time^ . with different methods of doing the same job and say. 
**From the point-of -view of the methods analyst , there are as 
many different methods of performance as there, are operators.” 
The study casts doubt upon the feasibility of group reliability 
indices and raises the possibility that the entire question of 
individual job performance reliability should be re-cast in a 
unique theoretical context. Perhaps adequate investigation 
V7ill require longitudinal study of individual subjects. 



This 
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approach vTould control for th.e 
characteristic?^ with situation 



interaction of unique individual 
characteristics. It is entirely 



possible for a relatively routine task to vary over time in 
terms of the responses reouired. Certainly this is obvious for 



complex tasks. 



Garter and Dudek 



(1947) in a carefully controlled 



study of navigator proficiency found high reliabilities for 
single missions but low between missions in fact^ they con~ 
eluded. ‘’,..in many complex skills reliability for any part- 
icular trial may be high and yet the corrcilations between 
trials , V7hich correspond to test-retest reliability , may be 
low.'* That such may be tru.e of other than complex. ski.l.ls is 



indicated in a -study by IClernmer and Lockhead (1962), In the 
study, of over 1000 operators of key punch and bank proof 
machines- it was found that individual variability was about 
6-10% of the c 5 roup mean and further that operator variability 



is relativelj^ lndG^p€‘.ndesnt of mean production leve^.l. 

k facet that contributes to perfor>.Tt.??nce reliability but 
which has received re.latively little attention is that dif- 
ferent persons do the same job in different ways. As long ago 
as 1939, Seashore discussed this aspect. He pointed out that 
motor, auditory and visual tests show low intercorrelations 
and personality inventories indicate many possible approaches 
to problem situations. I'/alJcer- et. ak. (1946) tested five ex- 
periencG'.d pilots for accuracy on 10 diffc^rent criteria for 
landing aircraft. procedurcis were usGid, **Tricks Allowed** 

and **i’Io Tricks.** meaning an individual vs. a standardized 
landing orocedure. The performance of individual pilots showed 
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more variabi?*.ity und€*.r the standardized condition than tne un- 
standardized and under "Tricka Allowed » accuracy of landin^r 
was sirnificantly increased. Vlhile the scope of this experi- 
ment was quite limited it is indicative that job performance 
among experienced personneil does vary a.nd. in fact. sue.'.!, v&r 
iability misht be desirable. It illustrates once again the 
point taat measures of reliability would be quite different 
depending upon which aspect of the job happened to be measured. 

It is unfortunate that studies of job performance reli- 
ability largely must be culled from the literature. However 
one group of workers, in department stores,, has been covered 
in separate studies that are of interest. Craig (192^-25) 
op. cit, in a study of 109 saleswomen found a reliability 
of .79 for value of sales over a period of several months 
and Stead (1937) coefficients of .83 to .98 over eight objec- 
tive measures of performance. Otis, et. al. . (1940) found, 
for six measures of job performance, gross sales per day .88 
ratio: salary to net sales .83 net sales per day .87 number 
of sales per day .89 returns per day .75 and actual quota 
per day .83. The latter study also shows the following table 

of inter correlations 



Gross Sales 
Returns 

Humber of Sales 



.58 



Numb^L^,J.al.^ 

.47 

.01 



Quqji S ii 

.65 
^2 
. 24 
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With the hi,f,h reliabilities found for the variables in 
the above table and their varying intercorrelations there are 
obvious implications for job performance reliability. Some of 
the implications are: hovr broadly is "job perforraance" defined 

how and over what period of time is reliability measured and., 
possibly, is performance variability an individual character- 



istic? 

The "how" of reliability measurement is directly related 
to the individual characteristic of variability. The common 
method for estmating performance reliability is. of course.. 
to corre.late two m.easurements of performance level at different 
periods. However, the previously mentioned studies by Klemmer 



and Loc^diead. PvOthe and Rothe and i''h/e al.l indicated that in- 
dividual variability is to a large degree independent of level 
of performance, Coombs (1948) discussed possible different 
measurem€ 5 nt s of the same performance but the implications of 
his study have remained relative.ly unexplored. ICel.lner (1960) 
has shovm that the use of "discrepancy scores" in both pre- 
dictors and criteria re.su.lts in better performance prediction 
and has outlins'.d a solid theoretica.1 base for the practice, 

Ghiselli (1956) op. cit. in a general discussion of the 
area virtually dismisses the idea of an index of job performance 
let alone its reliability^ and Ghiselli (1960a., 1960b 1963) 

has shovm that some of the clas.sic concepts of psychometric 



theiory can be seriously Questioned when related to job per- 
formance measurement. In the latter study it is shown that 



the c.lassic €^rror of mciasurement may be better understood, as 
related to traits of particular indivi uals rather than as a 
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group concept. Tine general concept of moderators had been 
studied by others as Fiske (1957a 1957b) and Berdie (1961) . 

but Qhiselli show.d how they could affect prediction of per- 
formance. :'Owever as applied to performance per se there is 
little evidence to show the effects, if any. Actually the 
study of individua.1 performance variability is just beginning, 
although the problem was thoroughly discussed in a surmnary 
article by Fiske and Rice (1955). In their evaluation of the 
evidence for intra- individual response variability, the authors 



distinguished three types of response variability. They were, 
’’sponta.neous*’ a.s might b€i found with, instrumental acts, ’’sys- 
tematic” where a response is affected by the preceding response 
or stimulus, and ’’variability due to changes” in the subject 
or situation. One of the major conclusions of the article is 
that there is a real lack of knowledge in the area, particularly 



in that of well learned activities. 

If we esr.tend our concept of performance behaviors to in- 
clude acts or incidents which are not directly related to the 
job functions perfosrmed by the individual , we find some in- 
teresting but conflicting results. Behaviors such as tardiness, 
absenteeism, accidents, grievances, supervisory reprimands, 
and dispensary visits are considered by some to be indications 
of organization perfo 2 ?mance (Merrihue and katzell, 1955) and 
individual performance (llerzberg, et. al, , 1959), Apart from 
any relation to mental health, the fact remains that each vari- 
able is subjc?.ct to objective measurement. Yet reliabilities 
vary widely depending upon the situation and the population. 
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Tardiness, absenteeism, .grievances and reprimands ^€iem to be 
the least stable exceot over long time periods. On the other 
hand, accidents and dispensary visits iLglfan to be quite stable 
V7ith high reliability reported--until the ’’objective" record 
is purged of such things as situational hazards , failure to 
report, inadequate records, etc. 

HovTever, with the purified criterion behavior another 
problem is encountered, in the case of accidents, a shrinking 
population of "perform€‘.rs . " If the cut off point is estab- 
lished as bein- a chargeable lost time accident, data collected 
over as long a period as two ye^ars still leave, in most cases, 
the majority of the population in the zero frequency category. 
The assumptions that the extended time period will provide the 
oPDortuiiity to "act" and that basing research on groups will 
ferret out the relationships simply beg the question. Tiie 
fact is that the assumptions are an admission of i-norance or 
inability to define or measure the performance behaviors being 
investigated. Psychologists have Ion' accepted either the 
"J" curve or Poisson distribution as correct to represent low 
probability single "acts" or incident perforraances . ";hile this 
concept does have substantial mathematical support, it seerar 
too parsimonious when applied to situations in which the indi- 
vidual participates purposefully. The restriction in range nas 
its obvious consequences. Extending the time period has other 
equally und- 0 ;jirable consequences as may the occurrence of the 
accident itself as postulated by liintz (1954). 

A similar phenomenon is encountered using patent applica- 
tions as a measure of creativity, or publications. 



even V7hen 
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both are corrected for opportunity bias, Taylor (1959) reports 
factorial reliability estimates (communalities as lower bound 
estimates) of .23 to .75 for objective indices of scientific 
productivity and crc^ativity. 

It would seem froxr^ the foregoing that either measurement 
is faulty or that measurement is not entirely relevant. It 
requires only minor immersion in a performance situation to 
boco;..e arars. Zhe/c ^ r±t!: individuals who are not tardy , t :e 
time of arrival to work varies considerably, or that among those 
who do not ha.ve lost ti-.'.e accidents there is considerable var- 
iation in frequency of cuts, scratches and bruise:i which do 
not receive medical attention and are not recorded, ijikewise, 
the scientists who hold no patents may on close es:amination 
vary con^^-iderably in the frequency of ’’near” patentable ideas. 

It seems that major flaws, insofar as reliability is concerned, 
are in definitions and record -kee'.ping of reasonably important 
kinds of individual incident:’. The data exist; individual.* 
a.re perforiiT.ing in spite of the failure to measure adequately. 

An answer to the question headin'" this section would ap- 
pear to be impossible with present knowledge. Actually, as 
later discussed, job petrformance is a complex of more or less 
unrelated ta.sks , few of which have been mea:?ured adequately in 
terms of their reliability. The correlation of absolute 

p€^rforma.nce levels affordin' the cla.ssic ertiraate of reliabi- 
lity actually avoids or at least becloud:-, the real is-.ue of 
individual variability of perfon^ance. The limited number of 
studies indicate.-” that individual performance variability is 
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aa much a charactrarictr-c of trae i:,idiva.clua.L a‘‘. i" an aptitude, 
perr’onality tnalt or other more comuioni}-' moa "ured cnaracccri ui.c. 
Actual.ly little hnovrlcd^jc i available a- to the extent or 
importance of individual variability , in fact , it is almost 
pornible to turn the cliche, ”more research is needed** into 
a more pointed --ome rc-carch i/ needed,** probably u-inp. intra- 
class correlation fro-v aralyir of variance de-i iv. If in 



no other way, it will at least define performance reliability 
and, it is possible, that Individual variability itself may 
be a better predictor or criterion than those that have been 
employed in the past. 
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I? Cbservation of Job Performance 7.eliable? 

In this section are recorte'^'' selected *^tudie'' vl-'j-re the 
' ame job perforuance is evaliiated by ciffereint Tietbod'’*' or by 

rater:’. TM« latter point ir often difficult to 
judge from the research report. The author'^’' have Probably 
erred in bs'.ing overly con'’ervative. in "'electing ■ tudie"’, but 
the effort x-^as v.ade to be a-' certain a- Po.-"’ible that the 
different estimate' of tne -'ame performance TOre independe’.nt . 

An earc.ly study by 'iraunhatu’^-en (1929) correlated r-u'oer- 
vifory ratings v’i'Ch job saTiple test score‘s’. For tX'^o different 
group*, ‘'Yule*"-- co€if f icie^nt of ar’''’ociation,'^ X'^ere 41 and 56, 

Fay and hidcictoii (1942), in an ingeniou*^ attempt at Performance 
evaluation, obtained recording''' of t’"'o cales ‘'■criot''’ readin'""**. 
by 29 retail salesperson's, ISach reading t.-aw rated, by 139 
college student-, for (1) enthusiasm, (2) convincirgnes-;' and 
(3) sales abilit}^. The follov'ing correlation'’ vere obtained 
bett'^een ratin'”"’ of the first and second, .scripts; 





i'ales 

^ MV .<,M« M , 
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.53 
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.--4 
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3 


.80 


.71 



These studies illustrate a point that continually recurs 
in the. literature; that i- , ratings tend to ."-how hi»?:her cor- 
relation:; with each other than do more objective mea^’ure"’ , and 
rating tend to fall somewhere between the tt-o. 
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Gomre3^ (1949) analyze.? acl-ievement by ^ect Point Cac'otr 
\;it: -ra:"c, in r.yr.n c?i:';:?c:cri:t cour-.e-' and a conpo-lte o-: ratin/i.- 

b" oe?.:r , acre .'lie and xailitary instructor^ a--' criteria. A 
factor analyei'’ of thei criteria resulted in eight factors v:^itb 
variance from ratings appearing in only two of the factors to 
indicate again the relative*, indeuendenee of different performance 
measures, j.iyan'" and ?reder5.hren (1951) i.s discur.^in- the general 
point of observer reliability cite a r:tudy without further 
identification where raters judging “metal objects" constructed 
to spetcif ications showed reliabilities of ,11 to ,55 in their 
judgments. Use of taper gages in the jiadging raised the co- 
efficients to .93 and .94. They co on to say, "It is possible 
to study distingul.-hed from 

judging performance*.) only 'f'^here*. the reliability of judging 
performance has been shovTii to be adequate" Gay .Lord j et. al# 
(1951), in a study directly concerned with, the relationship 
of performance rating* to ':nea'’*ures of actual production found 
coefficient" of .55, .48 and .49 between the former and three 
indices of production a.monr' fxLe c.Lerh'-, In a.ddxtion, t.he 
rater." ba.d. production record.s ava.xlable, .Lea-ding to some con 
tamination and probab.le inflation of t.he coefxicients founc, 

Peters and Campbell (.1955) intercorrelated self and .super- 
vi-'-or ratings of proficiency and score**’ on a diagnostic pro- 
ficiency to t of Air Force mechanic. * job knowledge. Correlations 
ranged from .32 between the -econd level supervisor rating.s 
and the te’t, to .37 between the self rating after taking 
the test a.nd the te-st scores* Pre-test ratings and. fir'^’t 
level supervisor rating.** were • 33 and . 35 respectively with 
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the test. The authors conclude that ratings are not closely 
enough correlated with diagnostic proficiency test scores to 
warrant a substitution. To sum up this point, Gaylord, et. al. 
(1951) op. cit. conclude that the correlation between two cri- 
teria should greatly exceed the level usually obtained in 
validation studies between predictor and a criterion. Their 
results .=;how correlations of .48 to .55 between composite 
production records and ratings and .24 to .46 between job 

elejments and ratinfiS. 

Springer (1953) compared ratings made by supervisory per- 
sonnel and by co-workers for promotion to leadman jobs. With 
a graphic, five, item scale, ratings were obtained by 100 workers 
and, with a -raphic, eight item scale, by 68 supervisors. The 
co-worker reliabilities ranged from .34 to .48, the supervisors 
.56 to ,71 and co-workers vs. supervisors from .15 to .39. In 
this situation one might be faced with a possible choice between 
usin': the one set of ratings or the other. The higher relia- 
bility of the supervisory ratings might indicate the choice 
but hollander (1954), in various Kavy studies, has indicated 
that "buddy ratings" have been found better predictors for 
some aspects of performance than supervisory ratings. Hollander 
(1956), Hollander and V.'ebb (1955), and Wherry and Fryer (1949) 
rule out postulated contaminating effect of friendship in peer 
nominations and in fact the evidence sugge'-ts friendship may 
be beneficial, perhaps in terms of opportunity to ob-erve. 

It is possible that more investigation of this area would 
indicate that each type of rating would have its place. It 
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is apparent 5 in any case, that performance, ratings by raters 
with different points of vievj have little in common* Liske, 

Ort, and Ford (1962) found higher interrater agreement of medical 
student clerkship performance when rater and ratee were in the 
same specialty* There were no essential differences xn faculty 
rating faculty vs. students rating students* Interest inrly 
though, while ratings Xv'ere consistent from to time for a 
composite (Intra~class correlation interrater agreement 

(r^Q) was only *05 for faculty and *31 for students* The low 
reliability, the authors conclude, is a function of combining 
raters and ratees with different specialties. 

Some indirect evidence of differences in the perception 
of the importance of job acts is provided by Prien (1962) and 
Prien and Powell (1961). In the former, factory foremen and 
their immediate superiors completed a checklist describing the 
foreman* s job. The average correlation of the relevant pa.irs 
(foremen and superiors) was .40. In the latter, training di- 
rectors and their immediate superiors followed the same pro- 
cedure and the averaged correlation was .53. here persons 
directly involved in the job cannot agree as to the relative 
importance of duties and virtually disagree in any per- 

formance observations. 

Over all it seems evident that the rater must be know- 
ledsieable to contribute real variance in ratings* This general 
point has received further confirmation in a study by hick^j 
and Stone (1962). Correlated rating's by peers and supervisors 
on manas'ement personnel showed for over all performance (.51), 
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oromotabiiity (•59) anc? verf>atility (,59), ^ihile thefre valuer 
acs iii'haiir tb.an th.o‘”0 caiDoirtad abov©, tbay r>ti.Ll indicat© a 
r€‘.al lack of agreament betwee^n rater*’-'. 

Finally, a r^tudy by Whit la and Rittell (1953) had 100 
mechanics rated on three areas — how well they could get alon”" 
with others , how well they knew their job , and how well they 
could do their job--by an immediate superior non-commissioned 
officer, a fli'^ht chief, and first level commissioned officer. 
Validity coefficients, against a job knowledge tes’t criterion, 
were .25 to .42 for the first group, .18 to *21 for the second 
and .20 to .25 for the third group of raters. This includes 



the correlation for irrelevant measures (getting along vs. test 



scored which certainly do not appear to differ from the relevant 
correlations. Sirailarly Prien and Liske (1952) found averaged 



correlations over eight graphic scales to be .50 between first 
and second level supervisors, to .25 between self ratings and 
first level supervisors and .13 between self ratings and second 
level supervisors. 

Siegel (1954) directly attacked the Question of the rela- 
tionship between various obse^.irvations of the same perfomianc€'.. 
In a study with-, kiavy craftsmen perfomiing four tasks , aluminum 
welding, plastic -patching, splicing a cracked aircraft channel, 
and repairing aircraft fabric were e‘valuated by a ”check list** 
for each a.nd a ranking of end products by chief petty officers. 



The inter- eiraminex reliabilities were .91 to .97 and retests 
.87 to .83. however, the rho values between check lists and 
chief petty officer's ratings were for weldinr .41, patching ."5 



5 
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splicing .26, and fabric repair .33. It is again obvious that 
whe'.re two or more differently made observations of the same 
performance are available, the relationsShip between th'em is usu~ 
ally low. Siegel, et. al. (1960) found in another much more 
comprehensive study that ratings by Navy craft supervisors on 
proficiency and training needed by 70 aviation machinist *s mates 
correlated .35 whereas one would expect a higher relationship 
on the basis that if proficiency is low, there is a need to re- 
commend training. In the previously mentioned study by Peters 
and Campbell self ratings correlated with first-and second- level 
supervisors* ratings yielded correlations of .30 and .23 respec- 
tively. The supervisors* ratings correlated .47 for a total 
sample of 154 mechanics. Although the composite self and su- 
pervisor rating correlated .46 with the proficiency test the 
prediction is considerably short of what could be considered 
equivalent results. 

Bayroff, et. al. (1954) in an experimental study designed 
to evaluate Army experience with ratings. Some of the relevant 
findings were that rating ability is a predictable individual 
skill, several ratings are better than one, control groups 
should be used to evaluate raters , rater reliability can be 
assessed properly only by using inter- individual agreement 
as an index, and that reliabilities tend to drop over a series. 
Related to this is a study by Bockner (1959) who divided raters 



into four classes on the basis of the extent to which they 
agreed in rating the same men. His results showed that higher 



agreement resulted in pooyg^p Prediction of performance in sub- 
school work. Possibly the clue to these discrepancies 



marine 
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liea in two other studies by Haggerty, et. al. (1959) and 
hackie and Ei.^h (1959), The former obtained ratings of West 
Point graduates an platoon leaders or company commanders in 
Korean combat, Hith multiple ratings on officers who had 
been in service for several years , the reliabilities ranged 
from .30 to ,63; it will be remembered that the Bayroff study 
found rating reliabilities tend to fall over a series. The 
latter study was concerned with havy machinery repairmen who 
completed job sample performance tests and relation of these 
results to ratings. The correlation^ were .32 and ,35 with two 
school ratings (2 ys'.ars earlier) on suitability for doing job 
and ,42 with predicted suitability as a machinery repairman. 

It would appear from these studies that whatever it is that 
ratings rate is changeable over a period of time and has little 
relation to objective measures of job pe.rfor^mance. It may well 
b€i that with changes in skill level or with changes in job 
requirements over a period of time, the personal behaviors 
required become more complex, less subject to obscirvation and 
thus less reliably rated. The concept of the dynamic character 
of criteria (Ghiselli , 1955) op. cit, is equally applicable 
to perforraance beihavior. This is particularly attractive 
explanation if the earl3.er definition of criterion behavior 
as situationally determined performance is accepted. 

Some*. g€‘.neral studies covering the problems encoi;ntered in 
job performance evaluation have been reported. Severin (1952) 
summarized some 150 studies where correlations were reported 
between different measures of job performance for the same 
people such as supe».rvisory ratings vs, procuction, tests or 



35 



some other i'.:ea.?.urep ar^Pociate rating? vs. similar measure’.?, and 
training grade? v?. production record?. The study can be sura- 
marized by the quotation, '^T.he median of all correlations in 
the table va^: .28 vThich seeims to be further evidence that one. 
cannot properly substitute one measure of job performance for 
another v'^itliout first hnoving the degrc’.e of eauivalence. ** In 
this connection, a study by Langdon (1932) is of interest, 
reportsid a correlation of .30 between a work sample test and 
later piece-rate wages, in a the relation between in- 

termediate and ultimate criteria. Gh.iselli and Brovin (1951) 
op. cit. , reviewed studies covering some 30 years that reported 
both training and job performance correlations. The correlations 
betw€^en the two different measures ranged from .15 to *22 for 
thre’.e job classifications and all jobs» J^leishman and Fruchter 
(1960) op. cit., found correlations of .26 to .41 between suc- 
cessive stages of learning Morse code and conclude that selection 
tests main.ly predicted initial success but later success was 
more a function of specific habits acquired during training. 

All four of these latter studies emphasize the desirability 
of differing methods of assessing job performance at differing 
levels of proficiency and also raise the question of whether 
or not the more successful trainees make the more successful 
later perfomers. Unfortunately, there is little direct evi- 
dence on this' qu.es.tion. Hilton and Di,ll (1962) found, however, 
that later salaries do correlate with starting salary. Again 
tbi.? rugge.' t. rather complex: p’:;enomcna. 

rerhap? the most definitive study in the study of per- 
formance ob?ervat5-on r€ilxabi.lity is tlia.t of Lifson (1953). In 



36 



thif^ study trained time-study uersonnel rated "work pace” as 
compared to “normal” by five different persons on four dif- 
ferent jobs. 2ach of these were rated twice at a one-month 
interval. The “workers” were stud.e.nts v^ho had had industrial 
experience and who “worked,” after considerable practice, paced 
by a metronome. The study revealed that ratings involve con- 
siderable error, some raters rate higher, some workers are 
rated more reliably, some jobs are rated more reliably, raters 
tend toward a norm, interactions are of importance, and an analy- 
sis of variance showed that one-third of the variance came from 
rater-to-rater differences, A more recent study by Whitlock 
(1963) demonstrated a close relationship between reported “ef- 
fective performance specimens” and ratings, however, the raters 
knew the individuals about whom the performances were reported 
and which they later rated. The lack of independence may be 
the basis for the reported relationship of effectives behaviors 
to higher ratings, 

Kipnis (I960) and Taft (1955) op, cit. , have discussed 
some of the*, major difficulties and distortions that are in- 
volved in the obseirvation of performance. Although the former 
refers mainly to ratings and the latter “the ability to judge 
others” both seriously question the reliability of human judg- 
ments of the performance of others, Taft mainly emphasizes 
distorting traits within the observers as , intelligence is of 
some importance in judging others, emotional stability is not 
a linear but has some relationship to ability to judge, self- 
insight gives better judgment on any particular trait, “social 
skill” is an important factor. Others are mentioned but these 
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ar© sufficisiit to sb.ow tliat porhaps fitudier^ of raters are needeo 
more than continued performance ratings. Kipnis , in contrast, 
emphasizes factors more or less independent of performance per 
se. These are grouped under ’’External Factors," i.e. , propinquity 
in the sheer physical sense, social setting whether cooperative, 
punitive or whatever , whether or not criticism is encouraged 
and "3ubord,inate Behaviors" as whether behavior "helps" the 
rater, halo by a subordinate doing well what the rater emphasizes, 
personal stalce by the rater in the rating or its use and various 
other such consideration. 

The studies cited indicate that reliability of job per~ 
formance observation as presently practiced can be seriously 
questioned. It is usual to find, where one or more independenc 
observations occu.r, that the correlation between them is low, 
expecially in situations where an "observation" is some rela- 
tively objective measure; for example, a job performance test. 

The history of evaluating job performance snows the importance 
of separate measures and limit© the value of any studies using 
a '-ingle measure of job performance, even as two raters. It 
VTOuld appear that a major aspect of the "criterion problem 
is the fact of unwanted variance and, further, that the sources 

of this variance are virtually unknown. 

In ad.dition to the foregoing information, there is another 
characteristic of job performance that lias been only implicit 
in the above--the multi-dimensional nature of job performance. 

The next section will present some of the knovm information on 



this, topic and how it pof'.es basic problems in the evaluation 



of job performance. 
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Job rerformar.ee Uni-Dimerr?ional 



The bi-Ptorj’' of oerr^onnel research studded v'itb the de~ 
veloorert and use of literally’- -uindrec^-. of oerformance predictors, 
ir the te.''’tirc area alore, Guiiford (1959) has estimated trat 
50 of po^'^ibl^r over 100 abilities have been described. In con- 
trast, the majority of reported rtudies urine the predictors 
have had. a 5.nple n:Iobal measure of performance. While it would 
seem that perfor-rance .in a particular job i*" much simpler than 
the total of individual abilities, is it .meaninsful to reduce 
performance mca'"urement to a single measure*? In addition, wnile 
a particular rea^ure of performance may be identified in '’everal 
se.6mirjr.ly idsvntical jobs , is it not conceivable tba.t the only 
i"iila.rity ir the na-'me given the performance beha.vxor? 

The li’".elihood and con'-equences of job performance com- 
plstrity wG^re s ivs'-n ea.rly recogn5.tion by hingsoury (1933) , 

"iorae e^rccutiver are rucce.'”'^’ful because they are good planner'"-, 
althoug.h not '•mjicc 0 "’'"-ful directors. Others are '^p.lendid. at. co- 
ord3-nating and. dirc'.cting , but their pla.ns and pre^rams are d.e 
fective. X'‘ew orecutive.« are. eoually competent in both directions, 
failure to recogniss^ and provide, in both testing and rating, 
for th.i.r obviou?" di.stinction is, I believe, one xaajor reason 
for thei un.'^atl'^ factory results of most attempts to study , 
rate and te^t ccr.ecu.tive'*^-. Good test- of one kind of esrecutive 
ability a.re not '■■ood tests of the oths^r k.ind,** Otis (1953) , 
op. cit, , cite- a similar e^rample of the college, professors 
w^'o may be. eoually successful, one on the bar.i- of research 
competence and productivity, and another on the ba-sis of cla.ss- 



room competc^nce. 



/..notber aor>roach to r-tudy of oerforrc'.ance is t-ie direct 

description of the characteri''tic"; of successful and unsucce«^s- 
ful p 6 irtor:'i:ie:r.;'' * *€*.nry (1949) s.nd Gbiselli and Barthol (1956) 

differentiate thci succe.?'’;ful fro:si the unsuccessful manager sug 
gsisting a relation betX‘7een personal characteri'^'tics and achieve- 
ment, Dalton (1951) on the other hand failed to find a formal 
oa.ttern of characteristic'” in ca.reer a.chievement. Informal 
processes did seem to pla .37 a part in career achievement in- 
cluding such t'''ing '5 as reli'^ion, ethnic bachground , nolitical 
belief and o^rtici'Dation in accepted orsanizations , The'^e 
contradictory results lend little to the concept of individual 
achievement save to indicate that firm bas€‘.s for investigation 
are lac’iinp,. 

Despite early recognition of the probably existence of 
several dimensions of job perfon^ance, it is onljr in compara- 
tively recent years that the field has received much attention, 
Flanagan (1949 ^ 1954a, 1954b) op. cit. , has discus 5 ?ed the use 
of his critical incident tecbniaue in isolating and defining 
•'job elementwS.” As previously described, this has been the 
only systematic attempt to define job performance in terms of 
its complexity and specif 3 -cs. r'owever, it is dependent upon 
observation and reportin'^* of performance as is, and as has been 
discussed, there is a real question as to the reliability of both, 
In addition, there is a question of vhat job performance could 
or sbotild be ^•■hich is not investigated with this technique, or 
for that matter, with any other. 

Another approach to defining the dimensions of job functions 
i : illt’strated by the studies of Jaspen (1949) and Palmer and 
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..cGomick ( 1961 ). Both studies are factor analyses of job des- 
criptions and both recognize their limitations in that they 
are exploratory. The former study shows six. meaningful fac- 
tors in "lower level" jobs and the latter four in a sample of 
250 steel mill jobs. Both of these exploratory studies have 
indicated that even relatively simple jobs have several in- 
dependent dimensions and the possibility that more would be 
found V7ith more rigorous investigation. Studies of the job 
functions of executive positions by Kemphill ( 1959 ) and of su- 
pearvisory positions by Prien ( 1963 ) reveal ten and seven di- 
mensions respectively. It would appear safe to assume that 
independent functions justify the search for independent per- 
formance criteria. Studies by Turner ( 1960 ), and Peres ( 1962 ), 
Roach ( 1956 ), and Grant ( 1955 ) further substantiate the judgment 
of complexity of job performance however described. These more 



generally oriented studies have indicated that job perfoarmance 
has a complexity that would require coverage by multiple measure- 
ments. This general statement is amplified in what follows by 
notin'- the complexity of single measures , single jobs and the 

relationships of perfoamciance indices. 

Analyses of single measures of job performance have shovm 

that often they are iaoro complex, than seems indicated. For 
exaiAple, analyses of ratings have shox^ that the intercorrela- 
tionn of trait scales describe more than one dimension in job 
performance. Swart, et. al. ( 1941 ) factor analyzed a 12-trait 
scale and found three factors, Bolanovich ( 1946 ) in an analysis 



Taylor and Munson ( 1951 ) , in a carefully 



found six. factors® 
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controlled study, present intercorrelations showing for the 
most part, low to moderate correlations amon.^- separate traits, 
liilton and Dill (1962) op, cit, , in an analysis of ’’salary 
growth.,” as a criterion, have shovin the considerations that must 
be given to any single measure for use as a criterion, for example, 
salary growth is independent of years of employment for the first 
six ye.ar?‘ but i. hi'^hly sensitive to first year salaries, Hu.“:e 
and Taylor (1952) using records on absencOvS for two years on 
total times absent, total days absent, one day absences and 
absences of three days or Ionizer found intercorrelations of 
,00 to ,88 araonji, various measures, with absence frequency being 
the single most reliable me.j*sure. King (1960) reported a factor 



ana.lysis of a 20-item questionnaire covering only ’’attitude 
toward company,” The study, in ten plants and with 735 employees, 
found three factors in the one attitude. ’iJeherrnan (1948) 
in a well designed study of employees submitting grievances 



found 13 itemvS of personal or personnel data that discriminated 
between grievants and nongrievants, Lurie (1942) factor analyzed 
12 indices of occupational adju.stment and found three factors 



in the indices. 

From these single measure studies of varying aspects of 
job performance, it appears thrt even such relative.ly simple 
measures are multi- dimensional in both their behavioral and 
causal aspects and that global measures of such performance 
are of doubt ftil utility, 

iSven though the multi-dimensional nature of job performance 
received early recognition, investigation of the dimensions 
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V 7 as later in starting and even yet only rudimentary knovrledge 
is available. L'O. earl}^ study by Gottsdanker (1943) is illus- 
trative of u'.€'. general results obtained when several measures 
of job perforiiiance 5 particularl3’' those of an objective nature, 
are used. Using as subjects 44 women learning to operate cal- 
culators and as criteria 20 minute tests in a work book, the 
following inter correlations wer€» obtained: 





II 


III 


IV 


Test I 


.50 


.45 


.13 


Test II 




.84 


.24 








.38 



The tests were simple arithraetical calculations of in- 
creasing difficulty and yet the interrelationships, on what 
would seera to be an easily learned and unitary skill, are quite 

varied. 

During '•/or Id liar II, one of the most intensively studied 
jobs v;as that of learning to fly aircraft. With the pass-fail 
criterion, Guilford (1947) showed that eight factors were in- 
volved in this single criterion of performance. Further analyses 
of the same job by Dudek (1949) and Michael (1949) compared 
factor loadings in the criterion with different populations. 

The former used tV 7 o groups of pilot trainees and one group of 
women trainess, the latter, two rxoups of white and one Hegro 
group of trainees. Both studies found that the factorial des- 
cription of the criterion varied from sample to sample. The 
variability was not only in weightvS but the appearance or non- 
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appearance of different factors. An investigation by Fleisn’nan 
and Ornstein (1960) indicated that sucb global measures may be 
even more comple::. than is shown by these studies. Sirty-frve 
fiyiri" stvdc.n.u vore tested on 24 vrane-uvers . 

A factor analysis of the maneuver score intercorrelations re- 
vealed si:: factors in the maneuvers, when it is considered 
that maneuvering is only a limited aspect of the aircraft com- 
mander job and such maneuvering is factorially comple::., it 
can be icurmired that the composition of the entire job is fac- 
torially formidable. As a sidelight to the cited study by 
Fleishman and Ornstein, it might be noted in reference to per 
formance reliabilities that the reliabilities of individual 
maneuvers, as estimated by the communalities , varied from 
.20 to .77 v^ith a median of slightly below .60, In view of the 
studies cited, t'tro quotations from them are pertinent, i.e, , 
from the last, "Similar analyses of the interrelationship:? among 
component performance measures of other complesc jobs may provide 
one way of defining the ability requirements underlying pro 
ficiency in those jobs." And, from Michael, "It is quite pro- 
bable txhat the r<ross pass-fail criterion could advantageously 
be replaced by many independent and relatively pure criteria." 
Trites (1959) et. al. attempted to do this. In his study, an 
analysis of performance for general flight training revealed 
five factors ex.tracted from 22 performance measures of which 

onl-y on© wss 

An area of performance that has received comparatively 
more attention than others in terras of its diraensionality is 
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that of academic achievement. Gaier (1952) studied criteria 
for success in medical school by analyzing grades received by 
two different classes. The results were "equivocal” because 
it was determined that the classes were not equal in either 
ability or achievement and further it was believed that the 
stanaaids of evaluation varied from class to class. This point 
is substantiated by Aiken (1963)* who presents results indi- 
cating that the concept of the average student is a function 
of the level of performance of the group and is not a stable 
abstraction. However, Haier (1952) op. cit., indicates that 
success was based upon ability, motivation and work habits and 
adequate prediction would require broader criteria to allow 
all three to function. Studies by Locke (1963) and Prien and 
Lee (1963), op. cit., of school achievement, indicate at least 
two dimensions; namely, structured achievement and unstructured 
achievement. Additionally, Prien and Lee note a social achieve- 
ment dimension. Preliminary studies by Davis (1964a, 1964b, 
1964c) analyzing faculty and student perceptions of performance 
indicate considerably greater complexity. 

Newman, et. al. (1952) studied two classes at the Coast 
Guard Academy. The criteria consisted of ratings by peers, 
officers and staff both ashore and at sea, "demerit scores” 
and course grades yielding over 20 measures for each class. 
Cluster analyses of over 2,000 correlations revealed three 
independent clusters of general adaptability to Academy life 
and activities, physical proficiency and attitudes, and aca- 
demic grades. Since the results for two separate classes agreed, 
it was concluded that the results seemed definitive. 
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Graham (.1954) and Bair, et. al. (1956) both analyzed achieve- 
ment in Kavy flyin ,'3 training. The former with seven criteria 
from both pre-flight and flight training obtained four factors 
in achievement* It is interesting to note throughout the 
table that two measures of f.lying ability have virtually zero 
correlations vjith all other measures and agree with each other 
to the level of .28* The latter study with 12 measures of 



achievement in pre-flight resulted in only three achievement 
factors* Of course this study also showed higher intercorre- 
lations since only grades were measured, but even here the highest 
correlation obtained was *72 and that was final navigation grade 
with the su 3 'iiip.ary grade measure and is somewhat spurious* 

Another study of achievement 3.n the Coast Guard Academy 
by rettner, et. al. (1959) had as criteria ten academic grades 
and rat5-iigs of cadets on cruise. Factor ana .lysis of these, 
alon^.- with 20 tests , showed that criterion -scores had signifi- 
cant loadin s on sir^ of 15 factors extracted. Thi-s analysis, 
more detailed than that of hevTrnan, reveals six. distinct bases 
for academic achievement and rating in one aspect of a total 



job. 

These studies of academic achievement performance measures 
again reve^.a! that even rather limited aspects of a ”job” ^re 
complex. Unfortunately, none of these studies as yet have re- 
ported comprehensive follow-ups of later careers and thus, the 
relation of training achis^vement to job proficiency ir not 
clearly established; it can on.ly be surmised that this per- 
formance wou.ld be e'.ven more complex, than training alone. 
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Some insight as to the complejcity of cumulative job per- 
formance data is provided by Richards, et. al. (1965) reporting 
a study of medical specialists. Eighty performance scores in- 
cluding thr€ie mea,suring academic performance were factor analyzed. 
The analyses yielded 29 factors , and this is viewed as a con- 
servative estimate of the complexity since no attempt was made 
to measure patient responses or the quality of medical care. 

Of particular interest, though, was that noth pre-medical and 
medical school performance were independent of the job per- 
formance of the group studied. The above study is perhaps the 
most comprehensive one performed to date and clear ly illustrates 
the magnitude of the criterion complexity issue. 

Performances measures for one job area have been subjected 
to several analyses--the salens job. Rush (1953) has presented 
what can be re'-arded as a classic study in the field, or in 
all personnel research for that nic^tter. The investigation 
covering both preliminary and cross-validation aspects used 
criteria of percent of assi-='ned quota achieved, average number 
of sales , average monthly volume (all corrected by a base sales 
figure) , '?*rades in a technical sales school and supervisory 
ratings on a nine scale form. From a table of intercorrelations , 
Rush extracted four factors of , I - objective achievement with 
loadings on the described indices, II - learning aptitude with 
loadings? in grades and ratings of technical knowledge and 
learning, III - a general reputation (halo) factor with loadings 
in ratings and IV - a sales technique and achievement factor 
which bad the only communality between objective sales measures 
and ratings, nrjich vTeakened, itowever, because of rather low, 
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V 



«} 



scattered loadings. From a lar-'e number of predictors including 
aptitude tests, a perf?onality inventory and personal history 
items , multiple regression eouations were constructed to pre~ 
diet each factor. Only IV did not produce a r>i<-;nif icant multiple 



II, and the best predictors for each V7ere different in every case. 
It is of interest to note that on the predictor, "number of ac- 
counting courses," which had been used as positive actually had 
a substantial negative relationship with Factor I in the later 
analysis. This study illustrating- the multidimensionality of 
job perf onaance , as is specifically commented in the study, also 
embodies some oth€ir points previously mentioned, as the relative- 
ly low relations3:‘ips of varied nerformance measures and the lack 
of r elat ions'hip betw€*.en objectn-ve sales measures and rating- of 
sales ability. It would appear there are actual achievements 
in both sales and training and then an unrelated supervisory 
OD inion concerning the achievements and, the suspicion exists, 
that this is common to many other fields of worlc. 

Two other rtudier with objective measure'-' of selling per- 
forraance by h:irclr_ner (1960) and l iner (1962) present tables of 
int€ 5 rcorr€ils -ions of various performance measures showin'' rela- 
tivGily high, "oositive corrsslations amon^ tnem. These would seem 
to indicate the possibility of a single "sailin':' ability" factor, 
however, another study by Baier and ‘Ou'^^'an (1957) using 13 ob- 
jective measures of sale- achievement by insurance agc'-nts pre- 
sents a table of intercorrelations t’-at obvio*i-ly contain- more 
trail one factor and indicates that rellin"' ability in at least 
one field is not the unitary ability that mignt be supposed- but 
is more accurately de'”cr5.bed by pAtsh*s. study, op. cit. 
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Of co'..irse 5 sales achievement has been so little explored, 
in the sense of these’, later comprehehsive studie-s , that only the 
most tentative judgments are possible; however, it does appear 
that some general principles have emerged. As indicated in an 
early study by iOorcus (1940) the establivshment of an ’’objective 
criterion is a. ou.ite ticklish orocedure and overlooking even 
seemingly minor a'^'pectr can apparently seriously bias r’uch 
criteria. In fact, Dorcus constructed ’’economic maps” of sales 
territories to furnish base points, ?welated to this noint is 
the^ r.he.G’.r number of criteria; if relatively fe^'7 are used it ap- 
pears tli€’.re is. a greater tendency for them to oe more, clo^sely 
re-.lated in a positive manner, perhaps thei result of a limited 
view of tl.'.e actua.1 possibilities, i’inal.Iy , the temporal aF-i- 
pecti^' of the sta.biiLity of r€i.lation.shipri are virtua.Lly unex- 
p.lored. 

One type of investigation that can perhaps illustrate the 
inu.'i-tidime.nsional nature of job ne^rfortnance better than any other 
is that V7b.ere criteria of auite diver.^e nature are specifically 
investi'-ated or 'f-diere value-s of eoually diverse performance 
predictors are assessed. Gadel and Kriedt (1952) in a study of 
193 I3H operators determined job satisfaction and interest by 
questionnaire- end job performance by supervi.sory ratings and 
obtained t.he fol.lo'^<7ing in.tercorrelatioiis : 









3 a t i s f a c t io n 


.OS 




Interest 


.08 


, 44 


Aptitude 


. 41 


"•,11 



InJC^res.t, 



-.11 
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Ferguson (1950) in e study of the utility of aptitude, in- 
terest and personal history items (’’Sconomic l.aturity ) in pre 
dictinr the job performance of insurance', agents concluded that 
personal history and aptitude items predicted performance v^here- 
as , survival was predicted by intereist* He also hypothesized 
that aotitude is a joint function of interest and ability and 
that long term prediction V7ill depend much more upon interest 
than upon ability. Clark (1961) in a quite comprehensive 
study of several Havy technician groups found virtually zero 
correlation between aptitude and intere.st measures and yet sub- 
stantial validities for both in the prediction of technica.1 

school grades. 

These selected studies indicate that performance is com- 
pletely based in the individual himself and , it is to be presumed , 
results in complex effect upon job performance. It VTill require 
established performance measures before the functioning and re- 
lative importance of these variables within individuals can be 
determined with any degree of accuracy and comprehensiveness. 

Other studies have demonstrated points that are of interest 
in this section. Bartelme, et. al. (1951) using a version of a 
driving skill test, develo^^ed by the American Automibile Associ- 
ation, attempted to predict Army truck driver performance. The 
interesting point is that the test battery predicted the cri- 
terion to the extent of .24 for light, .11 for medium, and -.12 
for heavy vehicle drivers. If a generalization is warranted on 
the basis of a single study, it would be indicated that a rubric 
covering a job as here, truck dri’-er, must be carefully investi- 
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sated before beins accepted. Lawshe and McGinley (1951) in a 
study of proof reading perforr.ance found a correlation of .O' 



betv:^een productivity and errors to indicate the likelihood of 
these being independent measures of achievement in a sin-'le job. 



Another 



for’.ance is 



approach 
t'at of u 



to determining t^-e dimensions of job per 
ins ^‘^hat might be called organizational 



indices to civaluate performance. To illustrate, Clarke (1946) 
found a correlation of .52 between absenteeism and turnover 
which obviously is much higher than many attempted predictions 
of turnover. Palmer and Schrocder (1961) show that theft of 
compan}? materials is inversely related to the practice of al- 
lowing employee discounts. Comprehensive or conclusive studies 
to identify basic dimensions which could be ascribed to the or- 
ganization are not available. 

Heron (1954) used six criteria to evaluate the performance 
of bus conductors. They were Gross Sarnings , "Shorts" on cash 
for tickets sold, number of periods of absence, disciplinary 
actions , times late for duty and a supervisory rating on how 



much the emoloyee was a "source of concern" to his supervisor. 
The intercorrelations and a factor analysis revealed i 
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1. Supervisory Rating 304 

2. Gross Warnings 

3. Shorts 
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A study of skilled tradesmen by Ronan (1963) factor analyzed 
11 job performance variables. Included v/ere ratings as well as 
personnel file data on orp,anization type indices. The analysis 
revealed four factors in thes6^ various indices of performance, 

A study by Fleishmen, et. al. , (1955) used three of the same vari- 
ables derived in exactly the same way, i.e. , absenteeism, acci- 
dents and grievances. The table below shows the comparative 
correlations as study I, the former, and II, the latter; 













1. 


Absent eeis/-'. 


I 


.05 


.24 






II 


-.20 


.37 
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Accidents 


I 
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II 
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Grievances 




1 





A similar comparison for the common variables for the 
heron and Ronan studies shows; 
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Supervisory Rating 
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.29 
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.38 
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Absence 
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Disciplinary Action 









These three*, studies seem to show considerable stability of 
relationships over widely varied or«-anizations and populations 
despite relatively low reliabilities for some. The reliabilities , 
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estimated from the communalities , in heron* s study were .493 
for rating, .426 for absence and .148 for disciplinary action. 

The same in Ronan*s study were respectively .543, .612 and .232. 
Fleishmen, et. al. , obtained reliabilities, corrected by the 
Spearman-Brovrn formula, of .85 for absence, .72 for accidents 
and .73 for grievances. All of these 'Studies covered compara- 
tively long periods of time which may have allov7ed relationships 
to appear that ordinarily do not do so* Research by Penn (1955) 
on the reliability of accidents indicates that reliability in- 
creases as the duration of exposure increases and as hazard in- 
creases. For the high hazard group maximum reliability was 
reached during the second of a four-year period. For the medium 
hazard group increases were found through the third year. The 
low hazard group first year reliability was not sifrnificant 
(r = .09, N = 50) but reliability increased throughout the Period. 
Maximum reliabilities for the high, medium, and low hazard groups 
were .87, *87, and .55 respectively. 

It would appear that further studies of these "organization- 
al indices*' might well be fn.iitful in attempted criterion develop- 
ment. 'iJhatever else mi-ht be said they do evaluate "real" aspects 
of perforr.'.ance ass contrasted with ratings which may or may not be 
doing so. It is possible that further development might lead to 
a partial, ultimate criterion if the contradiction can be accepted. 
For example, a dollar value could be estimated for absenteeism, 
accidents, turnover and many other such measures. In this way, 
for some dimensions of performance, the "dollar criterion" 
concept of Brogden and Taylor (1950) mirbt be approached. This 
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probably would require more limited statements about validity, 
but if all these indeperxident perfonTiance indices are to be pre- 
dicted, the predictor battery lively would be immense. Another 
problem would be the delayed impact phenomena characteristics 
of higher level jobs. The true and complete impact of a mode 
of performance, an act or a seouence mirht not occur for ycjars. 
Even when the results become manifest, the question of assigning 
responsibility would remain. 

An indication of the limitations of the studies cited im- 
mediately above and the complexity that might be encountered 
is found in a s^tudy by HcQuitty , et, al, (1954), I-Iere behavior- 
al descriptions of best-average-poor aircraft mechanics were 
obtained from peers and supervisors. From these a "descriptive 
inventory" was constructed and 428 line supervisors rated some 
hundreds of mechanics. A factor ana.lysis of the ratings ex- 
tracted 23 factors which accounted for only 50% of the variance. 
Obviously there are a large number of relatively indcipendent 
behavior dimensions related to job proficiency. This study 
found interest, character, personality and aptitude measures 
of importance with only the limited criteria of supervisory 
ratings. If, in addition, other criteria were to be used, the 
task of isolatin'-, all the possible relationships appears stag- 
gering if indeed it can be done in the foreseeable future. 

In summarj^, the information presented in this section in- 
dicates beyond any doubt the mxilti-dimensionality of job per- 
fomance; in fact, the phenomenon is characteristic of even 
limited aspects of a job as was sho\m in flying an aircraft. 
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To attempt to evaluate job nerformance vTith a pintle measure 
is worse than useless, it is misleading; and, for ratings, 
to keep in perspective all dimens5.ons of performance while 
rating would appear impossible. 



o 
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Ir. Job Perforraance Xodified 
Srtra- Individual Conditions 

Tar'it in the design of -.•post research studies has been the 
assumption that job perforriiance is directly the result of charac- 
teristics of the individuals invol ed. Predictors of many sorts 
have been used to describe individuals ^’ith the results related 
to some performance criterion but it is rare to find > in any 
single study, an attempt to determine if intra- individual are 
the only sources of variability in performance* 

The possible existence of biasing conditions within the 
situation has been called the sin-le most important criterion 
problem by both Beohtboldt (1951) and Cureton (1951). As Anastaai 
(1950) has pointed out, even though the shortcominp-.s of any cri- 
terion are knovTii, the operational result'^ are, that if it must 
be used, only the interpretation of validity coefficients 
would be changed, there still remains the relationship of pre- 
dictor and criterion behaviors. Kipnis (1960) op. cit# j has 
presented evidence to show that performance ratings are dio 
torted by supervisor- subordinate relationships and the context 
in which they occur. ICatzell (1962) has pointed out the general 
inadeauacies- of present day organizational theoj?y in any atteampt 
to assess effects of job performance as a dependent variable. 

The most comprehensive statement of the consideration.^ in this 
area is that of r.'orst (1941) op. cit. In stressing the lack 
of and need for research on behavioral fields the statement 
is made, '’Without dvTelling further on the point, it is clear 
that an individual**? oerformance in an activity cannot be viewed 
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as an isolated phenomenon outside the environmental context in 
V 7 hich the activity takes place. The activity must be analyzed, 
not only in terms of the characteristics of the person engaged 
in it, blit in the light of the principal external conditions 
which may influence it.” 

The general trend of opinion and what little evidence exists 
seems to be that situational conditions can modify individual 
job performance. To result in such behavioral changes , it is 
necessar3^ to assume that the individual has in some V7ay been 
changed. It seems unlikely that such conditions could alter 
individual aptitude, ability or interest (in the sense of in- 



terest inventory measuremsmt) levels, the changes must have an 



attitudinal , 



motivational or 



some such taxonomic base. 



It wouild 



seem that such changes would result from reactions or perceptions 
larg€‘.ly based upon personality traits. > One of the assumptions 
is that attitudes , and specifically job attitudes , are in some way 
related to personality characteristics. These relations were 
established in a .study by Svetlik (1961) but on.ly with any de- 
finite degrcte for these individuals manifesting some type of 
career concern (vo.luntary referrals for vocational counseling). 

The T/7Jn.ole area of personality traits, attitudes, morale 
and their relations to criteria of performance has proved ex- 
treme.ly difficult to attack and as yet only the most tentative 
results have been obtained. An early study by Lurie (1942) 
op. cit. , found three factors of occupationa.1 adjustment from a 
factor analysis of 12 indices; however, this general approach 
has not been direct. ly followeid by similar studies but has been 
approached in different ways. 
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An e:^permenta.l study by Perron (1957.) used scorers from 22 
personality tests given to 30 un^’ killed (pour lead in molds) 
workers. The crit€iria were^ average productivity for 67 week.? and 
ratings by si:r. supervisors , the correla.tion matri:^ was factor 
analyzed, ^our factors were found but none were*, related to 
the criteria, PJ-nat was found is quotcid, ’’It .‘-•eem.s in the sample 
studied it is the heterogeneous group of relatively unstable men 
who tend to be a source of concern to their supervisors,” This 
is based upon the fact that two of the factors correlated to 
the extent of ,53 with ”job adjustment,” 

Peck: and Parsons (1956), using P7ort''inston* s application 
blank as a diagnostic instn.vaent , found relatively high correla- 
tions, up to ,77 rlio 5 with production and ’’favorable” personality 
traits. They also founds as an incidental comment on performance 
reliability, that high producers showed little variation of 
production whereas lox*7 producers were much more variable. There 
is some possibility of criterion bias in this stud}?' since em- 
ployees were V7or.king against production standards and, in such 
.situations, it is quite common to find agreed upon level.s of 
production. It is also worthy of note here, as in heron’s study, 
the per.sons wi'th unfavorable personality patterns where chronic 
supervisory problems, heron has suggested the possibility that 
poor job performance is the independent and poor adjustment 
the dependent in the work situation. Such an hypothe-si«i is not 
beyond the realm of possibility, but there is little supporting 
evidence, 

A partial clue to the discrepancy between the two studies 
cited above may be found in the results of two other studies. 
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Kipnis (1962) un?ing a specially developed test, found "per- 
sistence beyond iiiini:i:^.uin standardr di tiring tasks" did pre*.dict 
performance particularly among lovTer aptitude individuals* 
Eysenck (1963), op* cit. , has rlwnn that motivation has opti- 
mum levels and that its relationship to perforraance is curvi- 
linear, From the^se studies , it would appear that pe^rformance 
can be affected by personality traits but, in general, the re- 
lationship is quite complex and may be in the nature of modera- 
tors. Parenthetically, the existence of "trouble-makers" in 
industrial organizations has often be€?n doubted, but it appears 
from the first two studies that they do exist and, whether or 
not their perfomance is good or poor, they can be undesirable 
employees on another dimension of job perfomancoo 



PvegardlesG of whether one is willing to attribute per- 
formance effects by the variables considered in this -action, 
the fact remains thrt a great deal of effort has been ex.oended. 
in tl'eir invor tirat5-on. There have been, first, -.’tudios of 



organizational features or characteristics as shov^n by their 
relation to various objective indices of performance. 

Using one index, turnover, Parkinson (1928) in a study of 
99 selling and distributing or^-anizations with over 60,000 em- 
ployees found higher turnover related to larger organizations* 

Ke says, "The outstanding observation from the canvas is that 
oersonnel conditions (labor turnover) are least favorable among 
the large forces ar- a class, and the most favorable among the 
small organizations," hovxever, Parkinson specifically points 
out that such a generalization is not completely warranted 
since there ±c: considerable overlap and some large organizations 
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h.ciV0. dif5ti.net Ly favonablcj tunnoven srstOG* c^s.'^'^atsky (19b L) 
presents^ data to rhow that in larger departments, within a 
f! in.'s le or'.anization, the turnover rate is hi?;her than in the 
small depa.rtment.?. Greystoke, et. al. (1952) have presentcKl 
data to show that the question of turnover as related to or- 
ganizational size needs further investigation since no distinct 
linear trends! regarding company .size or department size were 
evident for employees of either sex. 

These data, a.long with the qualifications mentioned by 
‘Parkinson, indicate that while a relationship between orga- 
nizational size and turnover does exist , it is by no -.eans a 
simple one. In any case, using turnover as criterion would 
require*, tabling into account organizational size as a factor af- 
fecting performance, but the contributing effect.^ of other in- 
f.Luencer-. would also need to be investigated. 

Another .single index that has received .'■••ome study, a-* af- 
fected by extra- individual factors, is that of accid6‘.nts. i'.err 
(1950) and Keenan, et. al. (1951) both have studied organizational 
chara.cteristic'5 as. related to accident rate. The former con- 
cerned 53 departments in one company with conclusions, stated 
a. 5 ' on.ly tenta.tive, that accident frequency is grea-test in tnose 
d.eipartment.^ v^itb. ’’lowest intra-corApany transfeir mobility rate.'’ , 
.smallest percent of emoloyee? who are female and on salary , 
least promotion probability for typical emoloyee, and highest 
mean noi-e level.” For f’everity the findings were, ’’...heavily 
male in sex. ratio for salary as well as production per'-’onnel, 

.low in mean promotion probabi.lity , Low in ferti.Lity of siig- 
gestion field, low in employee sugge'’tion:‘s contributed, high 
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(relatively) in average employee age level , and higher in 
average employee tenure.” An incidental finding of this study 
were correlations between accident frequency and turnover of 
.03 and with severity -.20. The latter study covered 1,945 
lost time accidents, 7,108 employees and the years 1944-48. 
V/ith supervisors rating ”departmental conditions” of 44 de- 
partments, the tentative findings were that promotion proba- 
bility increases safe behavior, “comfortable shop environment” 
is a major determinant of safe behavior, crew work brought 
higher injury rates as did greater manual effort involved. 

As in the case of turnover there is evidence to show that a 
moderator is at work with both injuries and lost time acci- 
dents, as a measure of job performance. A possible moderator 
suggested by Berber (1958) is that employees on an incentive 
pay syc«tem may have fewer reported injuries as evidenced by 
dispensary visits than do day-rate workers simply because 
minor injuries are economically costly to the employee, ^os- 
soris (1940) using data supplied by the t*Jir,consin Industrial 
Commission, the Swiss National Accident Insurance Fund and the 
International Labor Office Study of Austria studied the fre- 
quency rate of accidents as related to age over some 500,000 
industrial accidents. He found older worker*^ les-^ susceptible 
to injury than younger workers, older workers had more serious 
accidents and required longer to recover. As in the ca^e of 
turnover as a job performance criterion, it would appear that 
accidents are to some esrtent related to situational charac- 
teristics but, as yet, the detailed relationships are liasy. 
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An intereirting by Harriott: (1949) evaluated* f/'^ork 

'^roiip fize a:* related to output mea«-ure.d by piece work e.arnin p 
pex man. In two separate ' tudie.'’ with 153 groups and 79 to 98 
groups, low. inverse correlations were found indicatin'*- smaller 
sized group?.: genexally ■=*howing largex output. One exception 
was that groi;.p': of over 50 showed largex oi 2 tput than t.ho"e im- 
mediately smaller, prei- umab Ly because thi is the point of 
mechanization and/or the group ha become so large that its. in- 
fluence ba- decrea.sed. Actually there ir little information 
related to t’-’i ’ particular topic, the more recent burst of ac- 
tivity concerninr- small group effect" having largely been con- 
fined to decision making, ar sumption of leadership or similar 



topics • 

The esrlstence of another consideration in studies of the 
type preseiit€'.d above wa." shovg,! in a study by Fer<^u.son (1951), 
op. cit. In comparing validities for tie j-ife Insurance ' oti- 
tude Inde:?, a wide variation found across districts even 
though score distribution^* were comparable,, It was apparent 
that the evaluations, of job performance were quite different 
by the manager?': of different districts or agencies,. 

Cureton and Katzell (1962) in a study of 72 divisions of 
a company usin'” five mea.'"ure.'” of divisional performance and five 
descriptive '•ituational variables found two factors showing, t-'-at 
a non-urban culture pattern reflects small plant and community 
size relatively higher productivity and profitability wherea.-^ 
the other, urban, show« lower v^ages , fewer female employees, , 
no union and hi her tur^iover. Thus, one aspect of an orga- 
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nizatior., situs, had differential effects on several measures 
of overall performance. 

These r'tudies of specific aspects of perfoirmance or speci- 
fic indeper.de:.it variables indicate that situational variables 
probably have some effect on job performance; howev=>r, there 
are indications that these single aspects are acting in larger 
contexts. Recently this aspect of job performance measurement 
has received increasing attention and more comprehensive .‘■■tudie. 



have been completed. 

Stodgill- et. al. (195J) studies the administrative beha- 
vior of 470 h^vy’ officers in 45 positions, 47 orranizations 
and from iSnsign to ^drair'^l• A factor analysis of the data 
yielded ei.'rht factors that tended to ^roup xndividuals by the 
type of position they held. It was also found that types of 
positions tend to be found either in small or large organiza- 
tions or in ship as opposed to sjhore units. It was clear from 
the study that performance is at least in part determined not 
only by job demands but by particular job and place. In terms 
of job performance measurement it was suggested that measures 
of job performance patterns might be devised as opposed to 
evaluation of such traits as initiative, judgment, etc. 

Turner (1960) op. cit. , in a study of foreman performance 
in two different plants, factor analyzed two matrices of inter 
correlations obtained from 11 objective measures and a nine 
jrait rating of job performance. For both plants three simi- 
lar factors emerged covering rated perform^^nce, probably halo 
and reputation, an employee relations factor and a bi-polar 
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factor coverin' sci:ap and suggestions indicating that good 
perf ontiance on one is accompanied by poor perfonnpnce on the 
other. Tvfo other factors were much more poorly deiinedj and 
different in the two plants. One mi^-ht possibly be ** structuring** 
as described by Fleisbmen, et. al, (1955). op, cit. , but the 
other seems to indicate specificity to a particular pl?=^nt. 

Again war* found in th±r. study the lack of relationship between 
ratings and objective measures as previously discussed, rela- 
tively low reliabilities of certain individual measures, parti- 
cularly if measured over short time spans , and the multi- 
dimensional aspects of performance assumed by the author, ”It 
appears there la more than one pattern of foreman success and 
that it may be unrealistic to expect foremen to do well on all 
aspects of the job.” This study does indicate the possibility 
of establishin at least some performance indices that are common 
in various organizations, but it also indicates the uorr ibility 

of specificity to a '••ingle unit, 

’■Jherry, et, al, (1961) in a comprehe^nsive study of three 
Mr Force career fields have delineated the complexity of ap- 
proaching job performance from both total performance and or- 
ganization points of view. In this study, several measuring 
instruments were specifically developed for the study. They 
were an opinion inventory to measure job satisfaction, an ef- 
fectiveness rating scale and a specia-lized interview with su- 
pervisors, In addition, peer nominationr were obtrained along 
with a hoct of personal history item--, aptitude and achieve- 
ment test scorer and indicators of military achievememt, In- 
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tcrcorrelations and factor analysis yielded six performance, 
factors. These were: (A) ~ General Competence, (B) - Promotion 

Potential, (C) - Career Orientation, (D) - Peer Recognition, 

(E) ~ Job Satisfaction, and (F) "■ Job Centeredness, A paren- 
thetical point of interest is that aptitude scores were re- 
lated only to the General Competence factor. From the 

point of view of this section two quotes by the authors are 
pertinent 5 *’,,,seem to indicate the need for multi-dimensiona.1 
evaluation of airmen perforraance, " and, ’’There was considerable 
similarity of loadings across the three career fields to sug- 
gest that a universa.l criterion for job evaluation is possible,” 
Other tentative rcjsults of the study vxere that it may be possible 
to determine training needs with this procedure, discover su- 
pervisory potential early and that only si:r vscores woulH be 

needed to predict the six factors. 

Seashore, et. al. (1960) presented a study that would seem 
to cast some doubt on the utility of measuring across organi- 
sations or jobs and the use of various organisational indices 
as ci-iteria. The study used as criteria over all e'ffective- 
ness (a rating), productivity, chargeable accidents, unexcused 
absences and errors, the latter four, objective measurements. 

In the evaluation, three hypotheses were evaluated, (1) inter- 
correlations of job performance measures will be consistent, 

(2) patterns of intercorrelations among the variables similar 
in sise and sign as between individuals and organizational 
levels of analysis and (3) relationships among job performance 
criteria for individuals in any one organization ere repre- 



tentative of rolation^^bip- over a -et of homoseneour. or-ariz- 
tions. T:'-€*. authors: found for (1) that three of five criteria 
w.re internally conristent and the hypothe^ir^ receiver -o-r^e 
ruprort ., for (2) t'^'e ren.iltr. were inconclusive and for (3) re- 
jection as '‘one iv.ust conclude fro:r. this evidence that the re- 
lationship'' s-aons various .•^'spects of job performance are hirhly 
variable...'' In evaluatin'- the study it should be pointed out 
that data were collected for a period of only one month. 

Turner (IQCO) , op. cit. , says, ’’Single monthly scorer on criterion 
measurer, tend to ha.ve inadequate reliability across time. 

/.verase^J of several monthly scores are needed to attain a 
r.atirfactory level of reliability.” Turner bares this state- 
ment uDon his study xd.i.ere reliabilities over one montii ranged 
frora .03 to .59 with a median of .35. Over 3 1/2 to 5 months, 
reliabilities ar. estimated by comnunalities presented by Turner, 
were from .14 to .92 with a median of .82. Host of the higher, 
of course, were in ratings but even the objective measures had 
a median in the .60*s. Penn (1955) op. cit., indicates tne 
increase in reliability of accidents leveling off after 1 year 
for high hazard jobs and still increasing at the eno or. Jiour 
vear£. for low hazard jobs. Thin temporal asoect of job per- 
for'wance in or.e that b.an received very little attention, longi- 
tudinal studies over any time periods exoeedins one year are 
the exception. It mifht be well to ta'ie up the topic here. 

Viteies (1929-30) found over almost a two-year period 
that substation operators, classified into three groups by 13 
supervisors, confirmed the classification using an "error" cri- 
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terion, ths^ poorest group having ove?r seven times as many errors 
per man as the bet’t group, here, m contrast to previoUoly 
cited studies, a rating is confirmed by an objective criterion 
but also in contrast by records kept over a comparatively Ion' 
period of time. 

Another early study by Ball (1938) found a correlation 
of ,71 betxreen mental ability, as measured by a relatively 
simple test, and occu.pational status of office workers after 
an 18-year period. Further, there is no evidence of contamina- 
tion in the stud.y, it appears that the coefficient found is a 
good estimate of the relationship. Stead (1937) op. cit. , in 
a study of department store sales personnel used eight objective 
measures of performance on a year-to-year basis, re found 
reliabilit5-€is of .83 to .98 and s. multiple correlation of 
.65 for six tests, with a combined criterion. Strong (1934-35), 
op. cit. , in a study of insurance sales as a criterion found 
reliabilities of .77 to .84 on a year vs. year basis and in 
another study (!L943) op. cit. , found reliabilitie*-s of .74 to 
.84 correlating two years production (1925-27) with production 
for the years 1929-30. ICnauft (1955) correlated test scores 
from a general mental ability test (L01L?i.-I, 15 minutes) with 
job le'vel obtained over a 17-year period. A correlation of 
,60 was obta.ined over seven job classifications and, further, 
this is uncorrected for a restriction of rs.nge v^hich would 
probably raise it to near the value Ball, op. cit., found for 
similar period of time. 

"Jhitlock, et. al. (1963) in a study relating "unsafe beha- 
viors" to incidence of accidents specif iccslly studies, 
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facet of the investif,ation. the inf.h.ience of time on the rela- 
tionship. Th€^ trend of the data is for the relationship 
to increase with time, A specific recoimnendation of the 
study is that investigations in this general area of job per- 
formance must allow sufficient tirae for relationships to be- 
come apparent. 

In contrast to these studies, others showing longer time 
periods, tend to attenuate relationships’. The study by B^yroff, 
et. al, (195h) , op. cit. , presemts e^vidence to s^cw that reli- 
ability of ratings tends to drop, this with four ratings over 
a period of weeks. Ghiselli and Ilaire (1960) studied ta:<.i cab 
drivers over the. first 10 weeks of their emploj^ment and. found, 
in general, validities dropped, no single consistent predictor 
and validity correlations change when different criteria are 
used. Bass (1962), op. cit., found that ratings of sales per- 
sonnel showed low€*.r relationships over a 42-month period. 

Actually these latter studies concerned with ratings and 
relativel 3 r short time periods serve to emphasize the need for 
longer time periods and the questionable utility of ratings 
as criteria. The longer studies previox;.sly described using 
more objective criteria show quite substantial relationships 
e^ven v?ith simple predictors and high reliabilities^ for the*, ob- 
jective. perfoIrmance^ measi.^.res. Studies of the kind are, of 
course, difficu.Lt to conduct because of the neces.sity of record 
keeping, the influence of learning with new employees, the ever 
prese'.nt danger of conta.mination , and sample attrition, howeveir 
it appears, from the limited evidence available, that more 
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studies will need to be conducted for a ftill appreciation of 



job nerforir.ance and criteria developriient . 

An area where the influence of extraneous factors on job 
perforiTiance has been extensively studied concerns t^-'^t of 
leadership or supervision effects on morale, attitude or more 
directly a^ome measure of job performance. For example, l-rttnevu. 
(1951) after a rewiew of studies of leadership to that time 
reached as a partial conclusion, »Intercorrelations amonm^, vari- 
ous rfiea-surements of leadership we^re low but positive. I'"’ ere 
seems to be some tendency for those who are leaders in one'- 
situation to be leaders in other types of situations. iiOW- 
ever, a considerable portion of the leadership variance can- 
not be attra.buted to probably must be attrih:.ted 

in part to r " reco-nise 

that there are probably certain general requirements and al«’o 
that there are certain requirements which are unique for t.?e 
particular leadership situation one has in mind.” llatthews 
also points out that up to the time at which he was writiiiv 
there were few studies to show the effects of leader‘--hip on 
performance, primarily because of laclc of suitable criteria. 

Fleishman, et. al. (1955), op. cit. , in a comprehensive 
study of industrial foreman leadership isolated two factors 
called, “Consideration” and “Initiating Structure.” Sssentially 
the former factor describes, “a more friendly, trustixig per- 
son who develops a certain warmth between the leader and the 
group,” while the latter factor describes a person who i-g more 
prone to define his relat5-onship to the group, roles he ex- 
pectuS to be played and organizes the job. The scores for eacc 
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of these were correlated with proficiency (”-'anaseraent rating) 
and four objective indices of performance in both product5.on 



and non- pro duct ion department 
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It might be mentioned that reliability correlations , as 
measured by separate administration, for the leadership desig- 
nations vrere .53 for Consideration and .46 for Initiating 
Structure. The study also found that workers liked a foreiVian 
high on Consideration but a foreman is conside^red more profi- 
cient by superiors if he is higher on Initiating Structure, 
'’consequently there appea,rs to be a conflict between morale 
and efficiency. " We have here again a situation where by per- 
formance, i.e. , Consideration, a foreman might reduce absen- 
teeism, accidents, grievances and turnover but in the opinion 
of his superiors he would not be proficient on the job. This 
study also points out the situational variables in leadership, 
different leaders may be required in production and non-produc- 
tion departments. One point the study most forcefully indicated 
was the ej?:treme complexity of the leadership- job performance 
relationship. 
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Clevin and Fielder (1956) using an instruTient to measure 
"ASo” score, i.e. , supervisory prediction of subordinate's be- 
havior , which dichotomized supervisors into more accepting , 
approchable individuals as opposed to more critical, analytic 
persons found proficiency of work crews under the latter to 
be much more predictable. This was tirue of supearvisors in 
more direct contact with the crews while supervisors more distant 
from the crews, or work site, did not show such predictions. The 
study is of particular interest because it covers a longer time 



period than usually found and, the criterion, tap-to-tap time 
of open hearth heats is almost completely/ objective. In addition, 
the odd-even months criterion reliability was found to be .82. 

The finding by Clevel and Fiedler agrees with that of Fleishman, 
et. al. (1955), op. cit. , in that the supervisor showing Initiating 
Structure is regarded as more proficie^nt by management. However, 
this work group also shows higher rates on some undesirable 
indices, for instance, accidents. It appears from these studies 
that method or techniques of supervision have some influence 



on job performance, but they have their effects in complex, in 
fact, contradictory ways. This is further supported by the 
studies of Turner (1960), op. cit. , where bi-polar factors 



were found in foreman performance. 

Two reviews of the literature? of this area, Brayfield and 
Crockett (1955), op. cit., and Herzberg , et. al. (1957), have 
arrived at somewhat different conclusions. The latter cites 26 
studies covering the relationship of satisfaction or attitude 
to productivity. It was found that 14 showed a positive rela- 
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tioiiship, nine .'•showed no relation and three a negative relation- 
stiip and , it is concluded , that ^^’upervision ’’definitely af- 
fects” productivity to some degree. The former concluded, in 
general, that any relationship'^ w€'.re quite nebulous, in fact, 
the efforts in the entire area were seriously questioned on 
such bases as sampling involved , inadequate criteria , bias of 
self-report and group statistic*^. On a th€>.oretical basis, it 
was also questioned wh^^ morale, attitude, etc,, should be re- 
lated to productivity, no one-to-one relationship ever 

been clear Iv eistablished. Further, the comp.i.exity of h'uman 
goals, needD, satisfaction and such designations whetn placed 
in a complex situation of a viorh system have been most in- 
adequately explored. Such an analysis would involves x.iidivi~ 
duals, the factory social system, the worV. group, union and 
community at large. The authors said, ”We seem to have ar- 
rived at the position where the. social scientist in the in- 
dustrial setting must concern himself v^ith a full-scale analy 
sis of that situation." and, "Pursuit of this goal should provide 
US) with consid.6*xable intrinsic job satisfaction," 



This complexity, not intrinsic job satisfaction, was in- 
dicated in a pap>er by Kahn and i;/,orse (1951) which indicated 
the probable dimensions of individual sati^^f action, the inde- 
pendent variables, the uniqueness of individual need.s and the 
likelihood of interactions in a work situation. To some ex- 
tent attempts at systematic investigation of these have been 
made by the Survey H.esearcb Center at the University of Michigan, 
In summarizing some of the studies , Maccoby (1949) tentatively 
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concluded more pressure from above exerted on a. suoerviFor 5 ?ave. 
Iov7er productivity and, ^?upervi«^or.9 aa^:uming a ’’leadersrip 
role.'* gave higher productivity. Kann and Dent (1954). re 
porting t?tud.ie- hy the same group on mahe? an effective 

sup€:rvisor found "employee orientation" important by botn, 
subordinatec and management (contrast this with the; previously 
described Fleirhmen, et. al. , study) and, with Li^“ert (1951), 
the*, importance of voluntary communication by supervisors and 
recognition of subordinates. Pelz (1951) from some of the same 
studies stresses the "power" of a supervisor, that it, how in- 
fluential he may be with his o’-ti superiors. All of these have 
shown some relation to productivity. One later study from 
the Sumrey Research Center ‘T'-roups by Indih, et. al, (1961) 
has attempted to study some of these findings on tc;e basis of 
four hypotheses. These concerned the enhancement of job netr- 
fomance by op5.nionr. of superior- subordinate coramunica tio!' , 
.Tunportivo behavior by superiors , mutual understanding among 
members and fe’.el3.ngs of influencei over local operations, .itb 
four criteria of recorded production, "station" production, 
and ratings of individual effectiveness and station effective- 
ness generallj’’, positive associations were found in all test^ 
for the organization as whole and stations as such; however, 
analysis of individual stations gave widely varying results. 

Here again only a one-mOnth time period wss covered. A longer 
time might have given more opportunity for relationships to 
become pronounced. 

The Southern California Organization Research Project pub- 
lished a series of studies which generally supported the findings 
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of the Survey Re.ftsearch Center. Ho\'’ever, two of the studies 
dealing with shilled craftsmen at the San Diego Raval Air 
Station, (Wilson, et. al, , 1953 and 1954), found supervisors 
of high and lov 7 producing groups similar in the first study, and 
with no differences in a second. These studies have also found 
the existence of curvilinear relationships , effective super- 
visors have more^ confidence in their subordinates both in per- 
sonal and performance aspects and they have, in the more recent 
investigations, tended to become critical of psychologi,sts * 
emphasis upon the interpersonal as contrasted with the technical 
aspects of supervision. 

In stimmary, the investigations of the effects of super- 
vision and/or organizational characteristics seem to indicate 
some rather modest effect. However, negative findings or 
specificity always create a nagging doubt- -Is* supervision a 
modera tor? 

The question of the influence of situationv'=^l variables 
seems to indicate, from the presented material, that there is 
some modification of job performance by such variables, how- 
ever, the general conception of studies with one indeoerdent 
and one dependent variable has led to the situation where 
modest relationships, contradictory results or no results at 
all have become commonplace. It would seem that rtudier? -^uch 
as those by Stodgill, et. al. (1955), and Wherry, et. al. (1961), 
both previously discussed, are the immediate need. Retportcd 
studies have indicated specifics to be looked for and evaluated 
for their effects on job performance, but experimental invest!- 
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gations of ©xitixr© organizations with a gradual working dovTn 
toward sub~units and individuals must be conducted before, tne 
paramc'.ters of or'^anizational effects can be establis.ir)ed with, 
any degree of confidence. 
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Co nc Lu. io n?’’ p. nd J* y po th e « e. s 



From the forego in^: review, it Lp- apparent that job oer- 
formance va.rla'".c€‘. has been ^hown or ■orer'Utned to be a re'^'u.lt 
of a wide range of cau^-'’! inf luencer. and it«- mea '’urene^nt ia 
nebuioua. In general, tbo autiior.:. aubmit , that if any ricni- 
ficant progreaf ir to be made toward, .solution of ther-e orob- 

, aorr.e ba-'ic research conce^ption^ will need to be reca^’t 
and broadened, 

••'.r. a DO fible atarting point, it ia ^aiggeated that job 
peirformance will need to b>e viewed a? both an independent and 
dependent variable having p.ea'.urable output,? re 'ulting from 
the interaction of job behavior? , j?ituatio?.i cbaracteriatic- , 
and peraonal cbaracteri'ticr’. Broadly thc-e oi;.tout.® might be 
tboaagbt of a'' economic, adjustment, and personal. The fir^’t 
is measured by prodijction indices , the second by reports aucb 
a-^. abrenteei'-m, and the last by .‘^urvey techniques' or, in con- 
junction wit.h certain objective indices as grievance? or dir - 
ciplinary siction'!. If r.uc':: indices can be ci.atabli shed , it 
fliould be po-'- ible to de^i n more comolex .studie- encompas’-r-’inr 
ori-anization or broad ?ub-i..init.'’ to determine interdependence , 
relative importance and, me t impo.rtant.ly, caudal base? of 
variou'-* dimen^^ions of job performance. In addit?-on, organizational 
indice" do .mearure something ’’real” in contra -^'t to rlobal ratings 
that S'.eem to bear little or no relation-'hiD to objectively 
measured oerformance. Such indices may not be t.he mo'^t de- 
sirable from the viewpoint of <^tati?tical evaluation, for 
eramplei slcet^merr , but from a strictly pragmatic '5tand , they 
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exist and possibly represeint the only avenue for working toward 
a better understanding of job performance and criteria. 

To defiii 6 i the starting point and guide succeeding inv 6 u?ti~ 
gations some searaingly fru.itful hyootbeses are suggested belov^. 

Lon-" itudinal studies (five, or more year.?) will allow mtjch 
bett€‘.r predictions of perfomance than shorter .studies. 

Performance indice-' of t^'*e organization or «‘ub-un 3 .tr are 
required before co.molete as'^e.‘"sment of individual nerforraance 
can be accompli;’ bed . A. sub"h3’"*ooth.6‘.sis woiild be: 6 *conomic and 

’’satisfaction prodv-cing” effects of job behavior aret bi-oo.lar. 

Use of ’’organizational indices” absenteei'*m , accidents. , 
production, scrap, turnover, etc, , as criteria will jrield 
’’purer" more predictable criteria of job ps'.rf ormr?nce , tending 
toward orthogonality. A. sub-hypothea-is would be; bi-polar 
interrelations will be found, in particular, at higher job 
levels. 

Organizational indices will reveal common per fo nuance 
factors for functionally similar jobs vrith different Patterns 
of success or failure for functionally different jobs. 

lncrea..«iing required levels of Performance for particular 
jobs will result in hivher performance reliabilitio" and vali- 
dities. sub~bypothe.sis would be: specifically, more indivi- 

dual liberty to "do the job" will result in better job per- 
formance. 

"Eum-'n evaluation" of perfonmance will shox-y low relation"- 
ships- with objective indices of performance. Some sub-hypo- 
theses would be: some performances can only be evaluated by 
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human obse.rva.tion, i.e, , Heron* s (1952) op# cit» , ’’source of 
concern to their supervisors.”; performance evaluation ability 
is a predictable individual difference; reliability of judg- 
mental criteria will vary inversely with pro 2 ^imity, in parti- 
cular, peer evaluation will provide better criteria than su- 



pervisory# 

Different predictors and criteria are more appropriate 
at different points in time, i.e,, training vs. on-the-job, 



younger vs. older employcies. 

Predictor, individual, situational and organizational 
patterns of sub-group.s and interrelations wi.ll be revealed by 
splitting criterion groups into halves, thirds, or some other 
sections. sub -hypo thesis would be: novr unconceived hypotheses 



will be uncovered by the major hypothesis''. 

Job performance variability reliability is a predictable 
individual characteristic, e.g. , classic reliability theory does 
not apply to individual performance. Some sub "hypotheses would 
be: unless performance reliability is held constant, group 

validities will remain low; performance complexity is inversely 
related to reliability; perfomance reliability is a probable 



job performance criterion; situationa.1 moderator variables 
may inf, late or restrict reliabilities. 



Certain performances, ’’creativity” for example, will '^how 



close to zero reliabilities. 

Measured performance reliability will increase rs a function 
of (L) time span for measuring' increases and (3) purer cri- 
terion measures. 
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Measure:;' of individual satisfaction as criteria will aad 
another, separate, dimension to job performance, A sub-njT’po- 
thesis would be: job satisfaction is an encores sion of a more 

deeply based general satisfaction. 

Job perfoarmancei variance, resulting from morale ano atti^ 
tudes , is coin,pa.rativ£ily small. Some sub '•hypo theses wouIg be: 
morale and. attitudes function as mode'irator or mediating vsri~ 
ables and do not directly affect performance; half the variance 
from the major hypothesis is specific; sources of this variance 
affect job perfonnance differently at different level:; and 
different jobs. 

Moderator variables, most as yet untested, vill have to 
be isolated to determine differential effects on predictor™ 
perforraance relation.ship- . Some .vub ’hypo the ser would be: .'uch 

effects will be substantial in the ca'.^e of basic differences, 
as s€ix, for different jobs; functional job analysis will ren- 
veal conflicts in job composition, the conflict resulting nrom 
opposing requirement:: of pen-on characteristics; functional 
situation.rs.l analysis will reveal moderator variable-’ heretofore 
defincid a job performance variables of jobs in higher, the 
same, or lower levels in the organization hierarchy. 

Classical statistical techniques will give way to some 
form of prsttern analysis’ in analyzing and predicting job peir- 
formance, 

Whether or not the above hypotheses are adequate it i:' 
appareint that :;oni€i draistic re.sea.rch approach is required if 
any progress i:' to be made ixi personnel research, Ghis€5lli*s 
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review-’ (1955) shoT'Ted th?t little progress liad been made after 
approximately 40 years of research effort. Toweiver, some re-" 
cent studies ha.ve made promising beginnings in tli€*. direction 
indicated by the above hypotheses. 

In -leasurement , Ueitz (1961), op. cit. , show^^ that con- 
clusions in our experiments are dependent upon the criterion 
employed. Guion (1961), op, cit. , and Dunnette (1963b), op. cit. 
have proposed modif icat5-ons of the conventional approach to 
crite^rion utilization that have wide implications for criterion 
measurement. Fishe (1951) has discussed criteria and suggested 
defining and measuring job functions and u-’ing as criteria their 
contributi.on to the success ful functioning of lower echelons 
in an organization. Stark (1959) has made much the same point, 
limited to ex.ecutivej rucce.'-, in that executive jobs would be 
classified according to functions as supervising, planning, 
negotiating, investigating or some combination of these. Cn 
a more limited basis, ISnell and Haas (1960) and Patton (1950) 
discuss evaluation of executive performance in t€irms of , in 
the former, comparing sub-unit targets implicit in a parti- 
ailar job. Both would then arrive at an evaluation of executive 
perfomance based upon comparing unit perfontance with goals 
set by either method, Lamouria and Harrell (1963) . compe^nsate 
for differences in the importance of company objectives in 19 
different departments. Differences in functions were evaluated 
objectively r?.nd clinically and the resultant criterion scores 
for individuals (and department.?) were judg€‘.d to be less con- 
taminated than are clinical ratings. In effect, all of these 
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studies, more or less e:? 7 plicitly , recognize that job performance 
has a.n outcome, the outcomci can be evaluat€?d in and of it.:-€?lf 
or against some standard and, implicitly, the nc^ed. for broaae.r, 

objective performance measurer.. 

Toopr. (1959), op. cit. , has called cs.ttention to many of 
the'! points made in this review, but the number of studies at- 
tempting to follow his suggestions has been limited. 

Studies previously cited, McQultty, et. al, (1954), Seashore, 
et. al. (1960). Wherry, et. al. (1961), and Stogdill, et. al. 
(1955), have studied job perforrr:ance in the much broader 
context of organizational setting and such studies seem to be 
the desirable direction of personnel research in order to over- 
come the gen€ 5 ra.lly disappointing results obtained in more limit ea 
studies or, possibly, to make a new beginning. The designs 
used in the 5 ?e studies show the way toward models which might begin 
a more intensive investigation of the varia.b.Les that do or might 
affect job perforraance. The complexities of such studies have 
been discussed by Dunnette (1963b) , op. cit. , using the Guetzkov- 
and Forehand (1961) model and they are formidable but with 
modern computers the possibility of isolating job performance 
bases seems to be raore promising. However, the question arises 
as to what to measure, how to measure and, perhaps most impor- 
tant of all, can reliable measures be made? That these questions 
are pertinent is indicated by the 1954 McQuitty stucy wnere, 
with a "d€'.scriptive inventory*’ of 264 items , 23 factors were 
extracted which accounted for only s.Lightly over 50% of the 
variance?. The generally negative results of the- oeashore 
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study have already be.£‘.n cornmented upon. Both the 3 toed ill 
and 'iiherry studies were morei encouraginr; , but it appears some 
basic hyootheses must bcj civaluated bc^fore much fi.’,rtiier pro-^ress 
is possible even with broad en6‘-d ressearch designs. 






^robably the most iir*portant consideration is an abandon- 
ment of global criteria.. A.s Dunnette (1963), op* cit. nas 
pointed out, over-simplified studies consistently have ignored the 
many facets of job success and, in light of the studies di-.- 
cussed in the second section of this review, there can be no 
auestio:-*. of the multidimensional nature of even the simplest 

4i 

job, 

however, eve^n these studies do not seem to he. broad 
enou^th in scope or time to solve the "criterion problem." It 
is suggested that future investigations must be conceived on 
a much broader scale in order to answer the questions posed 
in the separate sections of thie review. 
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